Training: 2022-04-27 01:46:27,189-rank_id: 0 Training: 2022-04-27 01:46:53,870-: margin_list [1.0, 0.0, 0.4] Training: 2022-04-27 01:46:53,871-: network r100 Training: 2022-04-27 01:46:53,871-: resume False Training: 2022-04-27 01:46:53,871-: output work_dirs/wf12m_pfc02_r100 Training: 2022-04-27 01:46:53,871-: embedding_size 512 Training: 2022-04-27 01:46:53,871-: sample_rate 0.2 Training: 2022-04-27 01:46:53,871-: interclass_filtering_threshold0 Training: 2022-04-27 01:46:53,871-: fp16 True Training: 2022-04-27 01:46:53,871-: batch_size 128 Training: 2022-04-27 01:46:53,871-: optimizer sgd Training: 2022-04-27 01:46:53,872-: lr 0.1 Training: 2022-04-27 01:46:53,872-: momentum 0.9 Training: 2022-04-27 01:46:53,872-: weight_decay 0.0005 Training: 2022-04-27 01:46:53,872-: verbose 2000 Training: 2022-04-27 01:46:53,872-: frequent 10 Training: 2022-04-27 01:46:53,872-: dali False Training: 2022-04-27 01:46:53,872-: rec /train_tmp/WebFace12M Training: 2022-04-27 01:46:53,872-: num_classes 617970 Training: 2022-04-27 01:46:53,872-: num_image 12720066 Training: 2022-04-27 01:46:53,872-: num_epoch 20 Training: 2022-04-27 01:46:53,872-: warmup_epoch 0 Training: 2022-04-27 01:46:53,872-: val_targets [] Training: 2022-04-27 01:46:53,872-: total_batch_size 1024 Training: 2022-04-27 01:46:53,872-: warmup_step 0 Training: 2022-04-27 01:46:53,872-: total_step 248420 Training: 2022-04-27 01:47:18,732-Reducer buckets have been rebuilt in this iteration. Training: 2022-04-27 01:47:24,297-Speed 3331.84 samples/sec Loss 41.3790 LearningRate 0.1000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 8192 Required: 100 hours Training: 2022-04-27 01:47:27,387-Speed 3314.89 samples/sec Loss 42.6057 LearningRate 0.1000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 8192 Required: 75 hours Training: 2022-04-27 01:47:30,454-Speed 3340.67 samples/sec Loss 43.2166 LearningRate 0.1000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 8192 Required: 62 hours Training: 2022-04-27 01:47:33,520-Speed 3340.52 samples/sec Loss 43.3005 LearningRate 0.1000 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 8192 Required: 54 hours Training: 2022-04-27 01:47:36,601-Speed 3325.10 samples/sec Loss 43.7820 LearningRate 0.1000 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 8192 Required: 48 hours Training: 2022-04-27 01:47:39,605-Speed 3410.16 samples/sec Loss 42.6988 LearningRate 0.0999 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-04-27 01:47:42,633-Speed 3381.98 samples/sec Loss 42.7705 LearningRate 0.0999 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-04-27 01:47:45,631-Speed 3417.21 samples/sec Loss 42.5426 LearningRate 0.0999 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 8192 Required: 39 hours Training: 2022-04-27 01:47:48,654-Speed 3388.00 samples/sec Loss 42.8206 LearningRate 0.0999 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 8192 Required: 37 hours Training: 2022-04-27 01:47:51,678-Speed 3388.02 samples/sec Loss 42.6233 LearningRate 0.0999 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-27 01:47:54,731-Speed 3354.10 samples/sec Loss 42.6284 LearningRate 0.0999 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-27 01:47:57,877-Speed 3256.49 samples/sec Loss 42.5117 LearningRate 0.0999 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-27 01:48:00,935-Speed 3349.53 samples/sec Loss 42.2593 LearningRate 0.0999 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-27 01:48:04,033-Speed 3306.98 samples/sec Loss 42.1054 LearningRate 0.0999 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-27 01:48:07,480-Speed 2971.04 samples/sec Loss 42.0537 LearningRate 0.0999 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-27 01:48:10,476-Speed 3419.24 samples/sec Loss 41.9583 LearningRate 0.0999 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 16384 Required: 31 hours Training: 2022-04-27 01:48:13,571-Speed 3309.32 samples/sec Loss 41.9055 LearningRate 0.0999 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-27 01:48:16,627-Speed 3351.85 samples/sec Loss 41.9170 LearningRate 0.0998 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-04-27 01:48:19,665-Speed 3372.34 samples/sec Loss 41.7816 LearningRate 0.0998 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 16384 Required: 29 hours Training: 2022-04-27 01:48:22,682-Speed 3394.59 samples/sec Loss 41.7735 LearningRate 0.0998 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-27 01:48:25,757-Speed 3330.61 samples/sec Loss 41.6515 LearningRate 0.0998 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-27 01:48:28,790-Speed 3378.14 samples/sec Loss 41.6399 LearningRate 0.0998 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-27 01:48:31,779-Speed 3426.67 samples/sec Loss 41.4791 LearningRate 0.0998 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-27 01:48:34,793-Speed 3398.66 samples/sec Loss 41.3228 LearningRate 0.0998 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-04-27 01:48:37,891-Speed 3306.22 samples/sec Loss 41.2849 LearningRate 0.0998 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-27 01:48:40,944-Speed 3355.17 samples/sec Loss 41.1821 LearningRate 0.0998 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-27 01:48:43,958-Speed 3398.95 samples/sec Loss 41.1447 LearningRate 0.0998 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-27 01:48:46,954-Speed 3419.19 samples/sec Loss 41.0849 LearningRate 0.0998 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-27 01:48:50,015-Speed 3345.62 samples/sec Loss 41.1176 LearningRate 0.0998 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-04-27 01:48:53,016-Speed 3414.02 samples/sec Loss 41.1187 LearningRate 0.0998 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-04-27 01:48:56,043-Speed 3383.86 samples/sec Loss 41.0176 LearningRate 0.0997 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-04-27 01:48:59,103-Speed 3347.96 samples/sec Loss 40.9962 LearningRate 0.0997 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-04-27 01:49:02,149-Speed 3362.75 samples/sec Loss 40.9790 LearningRate 0.0997 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-04-27 01:49:05,158-Speed 3403.64 samples/sec Loss 40.8196 LearningRate 0.0997 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-04-27 01:49:08,178-Speed 3392.30 samples/sec Loss 40.8698 LearningRate 0.0997 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-04-27 01:49:11,201-Speed 3388.15 samples/sec Loss 40.7770 LearningRate 0.0997 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-04-27 01:49:14,233-Speed 3378.78 samples/sec Loss 40.7694 LearningRate 0.0997 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-04-27 01:49:17,248-Speed 3396.95 samples/sec Loss 40.6589 LearningRate 0.0997 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-04-27 01:49:20,266-Speed 3393.94 samples/sec Loss 40.5825 LearningRate 0.0997 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-04-27 01:49:23,349-Speed 3323.52 samples/sec Loss 40.6289 LearningRate 0.0997 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-04-27 01:49:26,504-Speed 3246.80 samples/sec Loss 40.6395 LearningRate 0.0997 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-04-27 01:49:29,607-Speed 3300.66 samples/sec Loss 40.5211 LearningRate 0.0997 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-04-27 01:49:32,674-Speed 3339.67 samples/sec Loss 40.4601 LearningRate 0.0996 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-04-27 01:49:35,785-Speed 3293.61 samples/sec Loss 40.3800 LearningRate 0.0996 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-04-27 01:49:38,829-Speed 3364.14 samples/sec Loss 40.3778 LearningRate 0.0996 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-04-27 01:49:41,876-Speed 3361.92 samples/sec Loss 40.3221 LearningRate 0.0996 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-04-27 01:49:44,894-Speed 3395.10 samples/sec Loss 40.3445 LearningRate 0.0996 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-04-27 01:49:47,947-Speed 3354.93 samples/sec Loss 40.2169 LearningRate 0.0996 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-04-27 01:49:50,965-Speed 3393.11 samples/sec Loss 40.2448 LearningRate 0.0996 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-04-27 01:49:54,026-Speed 3346.71 samples/sec Loss 40.1101 LearningRate 0.0996 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:49:57,027-Speed 3413.91 samples/sec Loss 40.0999 LearningRate 0.0996 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:50:00,080-Speed 3354.44 samples/sec Loss 40.1136 LearningRate 0.0996 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:50:03,132-Speed 3356.51 samples/sec Loss 40.0496 LearningRate 0.0996 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:50:06,220-Speed 3317.39 samples/sec Loss 40.0210 LearningRate 0.0996 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:50:09,261-Speed 3368.73 samples/sec Loss 39.9529 LearningRate 0.0995 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:50:12,299-Speed 3370.99 samples/sec Loss 39.9604 LearningRate 0.0995 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:50:15,364-Speed 3342.34 samples/sec Loss 39.8885 LearningRate 0.0995 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-04-27 01:50:18,379-Speed 3397.49 samples/sec Loss 39.8565 LearningRate 0.0995 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-04-27 01:50:21,414-Speed 3375.32 samples/sec Loss 39.9028 LearningRate 0.0995 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-04-27 01:50:24,443-Speed 3381.12 samples/sec Loss 39.8180 LearningRate 0.0995 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-04-27 01:50:27,484-Speed 3368.89 samples/sec Loss 39.7620 LearningRate 0.0995 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-04-27 01:50:30,504-Speed 3391.83 samples/sec Loss 39.6574 LearningRate 0.0995 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-04-27 01:50:33,518-Speed 3398.10 samples/sec Loss 39.7127 LearningRate 0.0995 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-04-27 01:50:36,637-Speed 3284.84 samples/sec Loss 39.6027 LearningRate 0.0995 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-04-27 01:50:39,666-Speed 3381.34 samples/sec Loss 39.6790 LearningRate 0.0995 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-04-27 01:50:42,697-Speed 3379.43 samples/sec Loss 39.6265 LearningRate 0.0995 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:50:45,695-Speed 3416.99 samples/sec Loss 39.5278 LearningRate 0.0995 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:50:48,722-Speed 3383.60 samples/sec Loss 39.5224 LearningRate 0.0994 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:50:51,750-Speed 3383.18 samples/sec Loss 39.4829 LearningRate 0.0994 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:50:54,796-Speed 3363.04 samples/sec Loss 39.4744 LearningRate 0.0994 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:50:57,817-Speed 3390.30 samples/sec Loss 39.3605 LearningRate 0.0994 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:00,853-Speed 3374.14 samples/sec Loss 39.3383 LearningRate 0.0994 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 01:51:03,929-Speed 3330.10 samples/sec Loss 39.3132 LearningRate 0.0994 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 01:51:06,971-Speed 3366.98 samples/sec Loss 39.3251 LearningRate 0.0994 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 01:51:09,959-Speed 3428.76 samples/sec Loss 39.3430 LearningRate 0.0994 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:13,023-Speed 3342.74 samples/sec Loss 39.1818 LearningRate 0.0994 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:16,115-Speed 3313.41 samples/sec Loss 39.1515 LearningRate 0.0994 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:19,203-Speed 3316.93 samples/sec Loss 39.1796 LearningRate 0.0994 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:22,240-Speed 3372.09 samples/sec Loss 39.1013 LearningRate 0.0994 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:25,307-Speed 3340.31 samples/sec Loss 39.0563 LearningRate 0.0993 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:28,432-Speed 3277.86 samples/sec Loss 38.9651 LearningRate 0.0993 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:31,510-Speed 3327.87 samples/sec Loss 38.9890 LearningRate 0.0993 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:34,585-Speed 3330.86 samples/sec Loss 38.9750 LearningRate 0.0993 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:37,648-Speed 3343.88 samples/sec Loss 38.9340 LearningRate 0.0993 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:40,777-Speed 3273.99 samples/sec Loss 38.8314 LearningRate 0.0993 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 01:51:43,820-Speed 3366.09 samples/sec Loss 38.8764 LearningRate 0.0993 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 01:51:46,853-Speed 3377.65 samples/sec Loss 38.8523 LearningRate 0.0993 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:49,993-Speed 3261.78 samples/sec Loss 38.8357 LearningRate 0.0993 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:53,103-Speed 3293.51 samples/sec Loss 38.7631 LearningRate 0.0993 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:56,182-Speed 3326.88 samples/sec Loss 38.6058 LearningRate 0.0993 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:51:59,225-Speed 3366.49 samples/sec Loss 38.6693 LearningRate 0.0993 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:52:02,355-Speed 3271.68 samples/sec Loss 38.6394 LearningRate 0.0993 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:52:05,429-Speed 3332.18 samples/sec Loss 38.6148 LearningRate 0.0992 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:52:08,463-Speed 3376.67 samples/sec Loss 38.6299 LearningRate 0.0992 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-27 01:52:11,495-Speed 3378.12 samples/sec Loss 38.6108 LearningRate 0.0992 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-27 01:52:14,596-Speed 3303.34 samples/sec Loss 38.4830 LearningRate 0.0992 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-27 01:52:17,694-Speed 3306.08 samples/sec Loss 38.4203 LearningRate 0.0992 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-27 01:52:20,730-Speed 3375.11 samples/sec Loss 38.5578 LearningRate 0.0992 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-27 01:52:23,762-Speed 3377.71 samples/sec Loss 38.4096 LearningRate 0.0992 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-27 01:52:26,802-Speed 3370.12 samples/sec Loss 38.3274 LearningRate 0.0992 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-27 01:52:29,873-Speed 3334.96 samples/sec Loss 38.3592 LearningRate 0.0992 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-27 01:52:32,896-Speed 3389.67 samples/sec Loss 38.1975 LearningRate 0.0992 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-27 01:52:35,965-Speed 3337.64 samples/sec Loss 38.2958 LearningRate 0.0992 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-04-27 01:52:39,036-Speed 3335.76 samples/sec Loss 38.2801 LearningRate 0.0992 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:52:42,162-Speed 3276.35 samples/sec Loss 38.1453 LearningRate 0.0991 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:52:45,189-Speed 3383.86 samples/sec Loss 38.1575 LearningRate 0.0991 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:52:48,247-Speed 3349.54 samples/sec Loss 38.0401 LearningRate 0.0991 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:52:51,293-Speed 3363.33 samples/sec Loss 38.1277 LearningRate 0.0991 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-04-27 01:52:54,356-Speed 3344.66 samples/sec Loss 38.1433 LearningRate 0.0991 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 01:52:57,358-Speed 3411.48 samples/sec Loss 37.9603 LearningRate 0.0991 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 01:53:00,443-Speed 3319.91 samples/sec Loss 37.9967 LearningRate 0.0991 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 01:53:03,608-Speed 3236.98 samples/sec Loss 38.0022 LearningRate 0.0991 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 01:53:06,750-Speed 3259.96 samples/sec Loss 37.8266 LearningRate 0.0991 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 01:53:09,754-Speed 3409.34 samples/sec Loss 37.9039 LearningRate 0.0991 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:53:12,764-Speed 3403.25 samples/sec Loss 37.8633 LearningRate 0.0991 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:53:15,816-Speed 3356.33 samples/sec Loss 37.8850 LearningRate 0.0991 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:53:18,876-Speed 3348.08 samples/sec Loss 37.7700 LearningRate 0.0991 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:53:21,891-Speed 3397.95 samples/sec Loss 37.6883 LearningRate 0.0990 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:53:24,932-Speed 3368.60 samples/sec Loss 37.5968 LearningRate 0.0990 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:53:27,980-Speed 3360.91 samples/sec Loss 37.6857 LearningRate 0.0990 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:53:31,095-Speed 3287.46 samples/sec Loss 37.6003 LearningRate 0.0990 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:53:34,123-Speed 3383.61 samples/sec Loss 37.5551 LearningRate 0.0990 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:53:37,157-Speed 3376.64 samples/sec Loss 37.5468 LearningRate 0.0990 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:53:40,196-Speed 3370.93 samples/sec Loss 37.5010 LearningRate 0.0990 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:53:43,234-Speed 3371.57 samples/sec Loss 37.3570 LearningRate 0.0990 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:53:46,271-Speed 3372.80 samples/sec Loss 37.3103 LearningRate 0.0990 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:53:49,297-Speed 3384.68 samples/sec Loss 37.2856 LearningRate 0.0990 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:53:52,391-Speed 3310.67 samples/sec Loss 37.2521 LearningRate 0.0990 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:53:55,430-Speed 3370.76 samples/sec Loss 37.2840 LearningRate 0.0990 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:53:58,436-Speed 3407.77 samples/sec Loss 37.2665 LearningRate 0.0989 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:54:01,454-Speed 3392.80 samples/sec Loss 37.1833 LearningRate 0.0989 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:54:04,471-Speed 3395.87 samples/sec Loss 37.1366 LearningRate 0.0989 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:54:07,515-Speed 3365.28 samples/sec Loss 37.0269 LearningRate 0.0989 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:54:10,549-Speed 3375.31 samples/sec Loss 37.0381 LearningRate 0.0989 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:54:13,609-Speed 3347.97 samples/sec Loss 36.9762 LearningRate 0.0989 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:54:16,629-Speed 3391.88 samples/sec Loss 36.9910 LearningRate 0.0989 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:54:19,650-Speed 3390.66 samples/sec Loss 37.0017 LearningRate 0.0989 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:54:22,649-Speed 3416.04 samples/sec Loss 36.8510 LearningRate 0.0989 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:54:25,660-Speed 3401.08 samples/sec Loss 36.8457 LearningRate 0.0989 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:54:28,730-Speed 3336.58 samples/sec Loss 36.8839 LearningRate 0.0989 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:54:31,789-Speed 3348.27 samples/sec Loss 36.7760 LearningRate 0.0989 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:54:34,799-Speed 3403.26 samples/sec Loss 36.7593 LearningRate 0.0989 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:54:37,877-Speed 3328.03 samples/sec Loss 36.7918 LearningRate 0.0988 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:54:40,913-Speed 3374.78 samples/sec Loss 36.6543 LearningRate 0.0988 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:54:44,017-Speed 3299.03 samples/sec Loss 36.6291 LearningRate 0.0988 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:54:47,052-Speed 3375.30 samples/sec Loss 36.5027 LearningRate 0.0988 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:54:50,191-Speed 3263.66 samples/sec Loss 36.5686 LearningRate 0.0988 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:54:53,292-Speed 3303.08 samples/sec Loss 36.6091 LearningRate 0.0988 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:54:56,350-Speed 3349.57 samples/sec Loss 36.5174 LearningRate 0.0988 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:54:59,374-Speed 3387.93 samples/sec Loss 36.4393 LearningRate 0.0988 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:02,425-Speed 3357.38 samples/sec Loss 36.4670 LearningRate 0.0988 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:05,495-Speed 3336.89 samples/sec Loss 36.2839 LearningRate 0.0988 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:08,509-Speed 3397.90 samples/sec Loss 36.3179 LearningRate 0.0988 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:11,545-Speed 3373.70 samples/sec Loss 36.2258 LearningRate 0.0988 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:14,581-Speed 3374.36 samples/sec Loss 36.2884 LearningRate 0.0987 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:17,606-Speed 3385.78 samples/sec Loss 36.2437 LearningRate 0.0987 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:20,636-Speed 3380.33 samples/sec Loss 36.1549 LearningRate 0.0987 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:23,693-Speed 3351.12 samples/sec Loss 36.0873 LearningRate 0.0987 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:26,759-Speed 3340.79 samples/sec Loss 36.0640 LearningRate 0.0987 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:29,857-Speed 3306.16 samples/sec Loss 36.0210 LearningRate 0.0987 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:32,906-Speed 3360.25 samples/sec Loss 35.9901 LearningRate 0.0987 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:35,934-Speed 3382.99 samples/sec Loss 35.8666 LearningRate 0.0987 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:38,974-Speed 3369.27 samples/sec Loss 35.9511 LearningRate 0.0987 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:55:42,023-Speed 3359.23 samples/sec Loss 35.8333 LearningRate 0.0987 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:55:45,068-Speed 3364.01 samples/sec Loss 35.9154 LearningRate 0.0987 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:55:48,091-Speed 3387.69 samples/sec Loss 35.8219 LearningRate 0.0987 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:55:51,126-Speed 3375.13 samples/sec Loss 35.8987 LearningRate 0.0987 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:54,167-Speed 3368.38 samples/sec Loss 35.6614 LearningRate 0.0986 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:55:57,196-Speed 3382.22 samples/sec Loss 35.7201 LearningRate 0.0986 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:56:00,276-Speed 3325.85 samples/sec Loss 35.6189 LearningRate 0.0986 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:56:03,341-Speed 3342.14 samples/sec Loss 35.7531 LearningRate 0.0986 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:56:06,391-Speed 3358.16 samples/sec Loss 35.5763 LearningRate 0.0986 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:56:09,436-Speed 3363.86 samples/sec Loss 35.5198 LearningRate 0.0986 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:56:12,497-Speed 3346.18 samples/sec Loss 35.5378 LearningRate 0.0986 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:56:15,518-Speed 3390.63 samples/sec Loss 35.3618 LearningRate 0.0986 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:56:18,588-Speed 3336.85 samples/sec Loss 35.3533 LearningRate 0.0986 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:56:21,631-Speed 3366.55 samples/sec Loss 35.5181 LearningRate 0.0986 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:56:24,705-Speed 3332.11 samples/sec Loss 35.3031 LearningRate 0.0986 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:56:27,768-Speed 3343.73 samples/sec Loss 35.1678 LearningRate 0.0986 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:56:30,830-Speed 3345.71 samples/sec Loss 35.2148 LearningRate 0.0985 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:56:33,847-Speed 3395.30 samples/sec Loss 35.1550 LearningRate 0.0985 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:56:36,910-Speed 3343.66 samples/sec Loss 35.1341 LearningRate 0.0985 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:56:39,982-Speed 3335.31 samples/sec Loss 35.0288 LearningRate 0.0985 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:56:43,065-Speed 3322.75 samples/sec Loss 35.0999 LearningRate 0.0985 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:56:46,100-Speed 3374.27 samples/sec Loss 35.1152 LearningRate 0.0985 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:56:49,146-Speed 3363.28 samples/sec Loss 34.8271 LearningRate 0.0985 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:56:52,197-Speed 3358.37 samples/sec Loss 34.9084 LearningRate 0.0985 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 524288 Required: 22 hours Training: 2022-04-27 01:56:55,213-Speed 3395.95 samples/sec Loss 34.9979 LearningRate 0.0985 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 524288 Required: 22 hours Training: 2022-04-27 01:56:58,256-Speed 3366.24 samples/sec Loss 34.9201 LearningRate 0.0985 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 524288 Required: 22 hours Training: 2022-04-27 01:57:01,317-Speed 3346.27 samples/sec Loss 34.7434 LearningRate 0.0985 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:57:04,427-Speed 3293.71 samples/sec Loss 34.7154 LearningRate 0.0985 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:57:07,464-Speed 3372.23 samples/sec Loss 34.7496 LearningRate 0.0985 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:57:10,509-Speed 3363.95 samples/sec Loss 34.6284 LearningRate 0.0984 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:57:13,563-Speed 3354.25 samples/sec Loss 34.6582 LearningRate 0.0984 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:57:16,670-Speed 3297.22 samples/sec Loss 34.5810 LearningRate 0.0984 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:57:19,702-Speed 3378.31 samples/sec Loss 34.5712 LearningRate 0.0984 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:57:22,729-Speed 3383.72 samples/sec Loss 34.5485 LearningRate 0.0984 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:57:25,822-Speed 3312.10 samples/sec Loss 34.5315 LearningRate 0.0984 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:57:28,835-Speed 3400.62 samples/sec Loss 34.4350 LearningRate 0.0984 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:57:31,853-Speed 3394.28 samples/sec Loss 34.3605 LearningRate 0.0984 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:57:34,883-Speed 3379.77 samples/sec Loss 34.3506 LearningRate 0.0984 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:57:37,918-Speed 3375.31 samples/sec Loss 34.2997 LearningRate 0.0984 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:57:41,006-Speed 3316.71 samples/sec Loss 34.3040 LearningRate 0.0984 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:57:44,021-Speed 3397.22 samples/sec Loss 34.1602 LearningRate 0.0984 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:57:47,075-Speed 3355.00 samples/sec Loss 34.2168 LearningRate 0.0983 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:57:50,217-Speed 3259.80 samples/sec Loss 34.0678 LearningRate 0.0983 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:57:53,272-Speed 3352.45 samples/sec Loss 34.1722 LearningRate 0.0983 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:57:56,309-Speed 3372.71 samples/sec Loss 34.0779 LearningRate 0.0983 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:57:59,332-Speed 3389.32 samples/sec Loss 34.0705 LearningRate 0.0983 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:58:02,394-Speed 3344.57 samples/sec Loss 34.0464 LearningRate 0.0983 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:58:05,442-Speed 3361.25 samples/sec Loss 33.9623 LearningRate 0.0983 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:58:08,520-Speed 3327.59 samples/sec Loss 33.9167 LearningRate 0.0983 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:58:11,598-Speed 3327.07 samples/sec Loss 33.7424 LearningRate 0.0983 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:58:14,644-Speed 3363.49 samples/sec Loss 33.8296 LearningRate 0.0983 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:58:17,693-Speed 3359.91 samples/sec Loss 33.5927 LearningRate 0.0983 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:58:20,730-Speed 3372.13 samples/sec Loss 33.6627 LearningRate 0.0983 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:58:23,824-Speed 3309.99 samples/sec Loss 33.5071 LearningRate 0.0983 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:58:26,962-Speed 3264.77 samples/sec Loss 33.6190 LearningRate 0.0982 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:58:30,020-Speed 3349.95 samples/sec Loss 33.5211 LearningRate 0.0982 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:58:33,084-Speed 3343.94 samples/sec Loss 33.4554 LearningRate 0.0982 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:58:36,248-Speed 3237.09 samples/sec Loss 33.4375 LearningRate 0.0982 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:58:39,281-Speed 3377.00 samples/sec Loss 33.4836 LearningRate 0.0982 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:58:42,316-Speed 3375.01 samples/sec Loss 33.2629 LearningRate 0.0982 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:58:45,340-Speed 3388.40 samples/sec Loss 33.2155 LearningRate 0.0982 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:58:48,366-Speed 3384.40 samples/sec Loss 33.3122 LearningRate 0.0982 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:58:51,372-Speed 3407.14 samples/sec Loss 33.2388 LearningRate 0.0982 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:58:54,463-Speed 3314.14 samples/sec Loss 33.1970 LearningRate 0.0982 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:58:57,517-Speed 3354.11 samples/sec Loss 33.1603 LearningRate 0.0982 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:59:00,589-Speed 3335.32 samples/sec Loss 33.1628 LearningRate 0.0982 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:59:03,651-Speed 3344.98 samples/sec Loss 33.2257 LearningRate 0.0981 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:59:06,707-Speed 3352.38 samples/sec Loss 33.0458 LearningRate 0.0981 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:59:09,709-Speed 3411.50 samples/sec Loss 32.9564 LearningRate 0.0981 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:59:12,768-Speed 3349.46 samples/sec Loss 33.1025 LearningRate 0.0981 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:59:15,803-Speed 3374.94 samples/sec Loss 32.9806 LearningRate 0.0981 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:59:18,894-Speed 3313.35 samples/sec Loss 32.8135 LearningRate 0.0981 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:59:21,930-Speed 3374.60 samples/sec Loss 32.9709 LearningRate 0.0981 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:59:24,996-Speed 3340.50 samples/sec Loss 32.7121 LearningRate 0.0981 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 01:59:28,102-Speed 3298.37 samples/sec Loss 32.7189 LearningRate 0.0981 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 01:59:31,129-Speed 3384.54 samples/sec Loss 32.7555 LearningRate 0.0981 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:59:34,150-Speed 3390.41 samples/sec Loss 32.6102 LearningRate 0.0981 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:59:37,154-Speed 3409.65 samples/sec Loss 32.7092 LearningRate 0.0981 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:59:40,251-Speed 3306.65 samples/sec Loss 32.6277 LearningRate 0.0981 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:59:43,320-Speed 3338.06 samples/sec Loss 32.4461 LearningRate 0.0980 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:59:46,347-Speed 3384.49 samples/sec Loss 32.6116 LearningRate 0.0980 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:59:49,363-Speed 3396.01 samples/sec Loss 32.5542 LearningRate 0.0980 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:59:52,412-Speed 3359.58 samples/sec Loss 32.3634 LearningRate 0.0980 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:59:55,471-Speed 3348.27 samples/sec Loss 32.2255 LearningRate 0.0980 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 01:59:58,522-Speed 3357.79 samples/sec Loss 32.4872 LearningRate 0.0980 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:00:01,660-Speed 3264.77 samples/sec Loss 32.1327 LearningRate 0.0980 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:00:04,729-Speed 3337.38 samples/sec Loss 32.3564 LearningRate 0.0980 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:00:07,749-Speed 3391.17 samples/sec Loss 32.1058 LearningRate 0.0980 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:00:10,787-Speed 3372.32 samples/sec Loss 32.0564 LearningRate 0.0980 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:00:13,839-Speed 3356.18 samples/sec Loss 32.2217 LearningRate 0.0980 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:00:16,906-Speed 3339.01 samples/sec Loss 32.1351 LearningRate 0.0980 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:00:19,947-Speed 3368.33 samples/sec Loss 32.1047 LearningRate 0.0979 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:00:22,979-Speed 3378.32 samples/sec Loss 31.8771 LearningRate 0.0979 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:00:26,132-Speed 3249.66 samples/sec Loss 31.8771 LearningRate 0.0979 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:00:29,195-Speed 3343.15 samples/sec Loss 31.9735 LearningRate 0.0979 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:00:32,265-Speed 3337.46 samples/sec Loss 31.6709 LearningRate 0.0979 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 02:00:35,306-Speed 3368.15 samples/sec Loss 31.7809 LearningRate 0.0979 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 02:00:38,400-Speed 3311.14 samples/sec Loss 31.7211 LearningRate 0.0979 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:00:41,413-Speed 3398.78 samples/sec Loss 31.6951 LearningRate 0.0979 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:00:44,431-Speed 3395.11 samples/sec Loss 31.5371 LearningRate 0.0979 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:00:47,481-Speed 3357.51 samples/sec Loss 31.5490 LearningRate 0.0979 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:00:50,551-Speed 3337.28 samples/sec Loss 31.5991 LearningRate 0.0979 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:00:53,639-Speed 3316.89 samples/sec Loss 31.5426 LearningRate 0.0979 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:00:56,680-Speed 3368.63 samples/sec Loss 31.3511 LearningRate 0.0979 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:00:59,697-Speed 3394.54 samples/sec Loss 31.4630 LearningRate 0.0978 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:01:02,737-Speed 3370.02 samples/sec Loss 31.4372 LearningRate 0.0978 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:01:05,768-Speed 3379.90 samples/sec Loss 31.4874 LearningRate 0.0978 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:01:08,800-Speed 3378.57 samples/sec Loss 31.2393 LearningRate 0.0978 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:01:11,872-Speed 3333.13 samples/sec Loss 31.2061 LearningRate 0.0978 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:01:14,895-Speed 3389.24 samples/sec Loss 31.1605 LearningRate 0.0978 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:01:17,946-Speed 3357.35 samples/sec Loss 31.2148 LearningRate 0.0978 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:01:20,970-Speed 3387.12 samples/sec Loss 30.9471 LearningRate 0.0978 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:01:24,024-Speed 3353.80 samples/sec Loss 30.9397 LearningRate 0.0978 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:01:27,110-Speed 3319.64 samples/sec Loss 31.0279 LearningRate 0.0978 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:01:30,145-Speed 3374.50 samples/sec Loss 30.9029 LearningRate 0.0978 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:01:33,190-Speed 3364.91 samples/sec Loss 30.9557 LearningRate 0.0978 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:01:36,232-Speed 3366.36 samples/sec Loss 30.9447 LearningRate 0.0978 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:01:39,289-Speed 3350.63 samples/sec Loss 30.8983 LearningRate 0.0977 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:01:42,366-Speed 3329.71 samples/sec Loss 30.7303 LearningRate 0.0977 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:01:45,389-Speed 3387.90 samples/sec Loss 30.7503 LearningRate 0.0977 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:01:48,438-Speed 3360.51 samples/sec Loss 30.7275 LearningRate 0.0977 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:01:51,513-Speed 3330.95 samples/sec Loss 30.6591 LearningRate 0.0977 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:01:54,550-Speed 3372.45 samples/sec Loss 30.5228 LearningRate 0.0977 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:01:57,568-Speed 3393.72 samples/sec Loss 30.5720 LearningRate 0.0977 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:02:00,691-Speed 3280.72 samples/sec Loss 30.5071 LearningRate 0.0977 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:02:03,745-Speed 3353.79 samples/sec Loss 30.6058 LearningRate 0.0977 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:02:06,777-Speed 3377.85 samples/sec Loss 30.4940 LearningRate 0.0977 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:02:09,834-Speed 3350.92 samples/sec Loss 30.3444 LearningRate 0.0977 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:02:12,944-Speed 3293.93 samples/sec Loss 30.2513 LearningRate 0.0977 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:02:16,013-Speed 3337.87 samples/sec Loss 30.3496 LearningRate 0.0976 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:02:19,081-Speed 3338.75 samples/sec Loss 30.0628 LearningRate 0.0976 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:02:22,105-Speed 3386.40 samples/sec Loss 30.1535 LearningRate 0.0976 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:02:25,175-Speed 3337.10 samples/sec Loss 30.1175 LearningRate 0.0976 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:02:28,274-Speed 3305.57 samples/sec Loss 30.0294 LearningRate 0.0976 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:02:31,304-Speed 3379.75 samples/sec Loss 30.0082 LearningRate 0.0976 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:02:34,341-Speed 3373.05 samples/sec Loss 30.0372 LearningRate 0.0976 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:02:37,383-Speed 3368.03 samples/sec Loss 29.8353 LearningRate 0.0976 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:02:40,457-Speed 3331.86 samples/sec Loss 30.0196 LearningRate 0.0976 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:02:43,545-Speed 3317.05 samples/sec Loss 29.8695 LearningRate 0.0976 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:02:46,576-Speed 3379.15 samples/sec Loss 29.7520 LearningRate 0.0976 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:02:49,589-Speed 3399.22 samples/sec Loss 29.8082 LearningRate 0.0976 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:02:52,653-Speed 3343.20 samples/sec Loss 29.6802 LearningRate 0.0976 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:02:55,681-Speed 3383.61 samples/sec Loss 29.6490 LearningRate 0.0975 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:02:58,696-Speed 3397.20 samples/sec Loss 29.6759 LearningRate 0.0975 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:01,737-Speed 3368.53 samples/sec Loss 29.6513 LearningRate 0.0975 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:04,820-Speed 3321.81 samples/sec Loss 29.4203 LearningRate 0.0975 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:07,865-Speed 3365.04 samples/sec Loss 29.5904 LearningRate 0.0975 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:10,903-Speed 3371.23 samples/sec Loss 29.4270 LearningRate 0.0975 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:13,985-Speed 3323.81 samples/sec Loss 29.4073 LearningRate 0.0975 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:17,040-Speed 3352.86 samples/sec Loss 29.3962 LearningRate 0.0975 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:20,076-Speed 3373.47 samples/sec Loss 29.2226 LearningRate 0.0975 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:23,096-Speed 3392.28 samples/sec Loss 29.1727 LearningRate 0.0975 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:26,131-Speed 3375.17 samples/sec Loss 29.2226 LearningRate 0.0975 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:29,186-Speed 3351.98 samples/sec Loss 29.2951 LearningRate 0.0975 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:03:32,276-Speed 3315.52 samples/sec Loss 29.1263 LearningRate 0.0974 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:03:35,318-Speed 3367.91 samples/sec Loss 29.0351 LearningRate 0.0974 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:38,360-Speed 3366.50 samples/sec Loss 29.1407 LearningRate 0.0974 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:41,442-Speed 3323.49 samples/sec Loss 28.9678 LearningRate 0.0974 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:44,459-Speed 3396.06 samples/sec Loss 28.9629 LearningRate 0.0974 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:47,543-Speed 3321.46 samples/sec Loss 28.9199 LearningRate 0.0974 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:50,588-Speed 3364.01 samples/sec Loss 28.7484 LearningRate 0.0974 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:53,683-Speed 3308.88 samples/sec Loss 28.6564 LearningRate 0.0974 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:56,698-Speed 3397.71 samples/sec Loss 28.7795 LearningRate 0.0974 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:03:59,767-Speed 3337.53 samples/sec Loss 28.7957 LearningRate 0.0974 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:04:02,801-Speed 3376.68 samples/sec Loss 28.7782 LearningRate 0.0974 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:04:05,856-Speed 3352.30 samples/sec Loss 28.4583 LearningRate 0.0974 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:08,865-Speed 3405.38 samples/sec Loss 28.6801 LearningRate 0.0974 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:12,001-Speed 3266.57 samples/sec Loss 28.5461 LearningRate 0.0973 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:15,093-Speed 3312.25 samples/sec Loss 28.3665 LearningRate 0.0973 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:18,150-Speed 3350.60 samples/sec Loss 28.3219 LearningRate 0.0973 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:21,183-Speed 3377.54 samples/sec Loss 28.3290 LearningRate 0.0973 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:24,226-Speed 3366.15 samples/sec Loss 28.4336 LearningRate 0.0973 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:27,322-Speed 3307.67 samples/sec Loss 28.2266 LearningRate 0.0973 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:30,468-Speed 3256.12 samples/sec Loss 28.0418 LearningRate 0.0973 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:33,498-Speed 3380.08 samples/sec Loss 28.0531 LearningRate 0.0973 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:36,586-Speed 3318.05 samples/sec Loss 28.1476 LearningRate 0.0973 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:39,618-Speed 3377.62 samples/sec Loss 27.9580 LearningRate 0.0973 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:42,643-Speed 3386.72 samples/sec Loss 28.0544 LearningRate 0.0973 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:45,653-Speed 3402.21 samples/sec Loss 28.1431 LearningRate 0.0973 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:48,706-Speed 3355.93 samples/sec Loss 27.8662 LearningRate 0.0972 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:51,747-Speed 3367.96 samples/sec Loss 27.8903 LearningRate 0.0972 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:54,778-Speed 3379.33 samples/sec Loss 27.8997 LearningRate 0.0972 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:04:57,802-Speed 3387.47 samples/sec Loss 27.8557 LearningRate 0.0972 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:00,851-Speed 3359.91 samples/sec Loss 27.6676 LearningRate 0.0972 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:03,894-Speed 3365.67 samples/sec Loss 27.5943 LearningRate 0.0972 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:06,954-Speed 3346.84 samples/sec Loss 27.7842 LearningRate 0.0972 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 02:05:09,971-Speed 3395.43 samples/sec Loss 27.5962 LearningRate 0.0972 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 02:05:13,086-Speed 3288.68 samples/sec Loss 27.4764 LearningRate 0.0972 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 02:05:16,122-Speed 3374.14 samples/sec Loss 27.5138 LearningRate 0.0972 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 02:05:19,134-Speed 3400.66 samples/sec Loss 27.4917 LearningRate 0.0972 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:22,141-Speed 3405.45 samples/sec Loss 27.4574 LearningRate 0.0972 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:25,210-Speed 3338.10 samples/sec Loss 27.5100 LearningRate 0.0972 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:28,272-Speed 3344.87 samples/sec Loss 27.3330 LearningRate 0.0971 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:31,354-Speed 3323.49 samples/sec Loss 27.4141 LearningRate 0.0971 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:34,383-Speed 3381.84 samples/sec Loss 27.2396 LearningRate 0.0971 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:37,411-Speed 3382.92 samples/sec Loss 27.1334 LearningRate 0.0971 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:40,440-Speed 3381.61 samples/sec Loss 27.1677 LearningRate 0.0971 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:43,499-Speed 3349.25 samples/sec Loss 27.1206 LearningRate 0.0971 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:46,544-Speed 3363.85 samples/sec Loss 27.1836 LearningRate 0.0971 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:49,574-Speed 3380.20 samples/sec Loss 26.9725 LearningRate 0.0971 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 02:05:52,622-Speed 3360.17 samples/sec Loss 27.0419 LearningRate 0.0971 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:05:55,681-Speed 3349.64 samples/sec Loss 26.6906 LearningRate 0.0971 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:05:58,749-Speed 3338.16 samples/sec Loss 27.0330 LearningRate 0.0971 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:06:01,770-Speed 3390.78 samples/sec Loss 26.9396 LearningRate 0.0971 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:06:04,798-Speed 3383.15 samples/sec Loss 26.8900 LearningRate 0.0971 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:06:07,837-Speed 3370.80 samples/sec Loss 26.6533 LearningRate 0.0970 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:06:10,870-Speed 3376.95 samples/sec Loss 26.6379 LearningRate 0.0970 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:06:13,899-Speed 3381.79 samples/sec Loss 26.8796 LearningRate 0.0970 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:06:16,949-Speed 3358.19 samples/sec Loss 26.5542 LearningRate 0.0970 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:06:19,975-Speed 3385.58 samples/sec Loss 26.5676 LearningRate 0.0970 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:06:23,017-Speed 3367.09 samples/sec Loss 26.6223 LearningRate 0.0970 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:06:26,082-Speed 3341.67 samples/sec Loss 26.3039 LearningRate 0.0970 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:06:29,225-Speed 3258.73 samples/sec Loss 26.4077 LearningRate 0.0970 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:06:32,274-Speed 3359.64 samples/sec Loss 26.4164 LearningRate 0.0970 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:06:35,329-Speed 3352.84 samples/sec Loss 26.5432 LearningRate 0.0970 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:06:38,369-Speed 3370.18 samples/sec Loss 26.3629 LearningRate 0.0970 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:06:41,451-Speed 3323.65 samples/sec Loss 26.3054 LearningRate 0.0970 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:06:44,522-Speed 3334.78 samples/sec Loss 26.2133 LearningRate 0.0969 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:06:47,543-Speed 3391.40 samples/sec Loss 26.1765 LearningRate 0.0969 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:06:50,634-Speed 3313.15 samples/sec Loss 26.1562 LearningRate 0.0969 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:06:53,710-Speed 3330.36 samples/sec Loss 26.0606 LearningRate 0.0969 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:06:56,744-Speed 3376.11 samples/sec Loss 26.0855 LearningRate 0.0969 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 02:06:59,772-Speed 3382.49 samples/sec Loss 26.1551 LearningRate 0.0969 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 02:07:02,835-Speed 3344.20 samples/sec Loss 25.9013 LearningRate 0.0969 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:07:05,908-Speed 3333.08 samples/sec Loss 25.9718 LearningRate 0.0969 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:07:08,928-Speed 3392.00 samples/sec Loss 25.8099 LearningRate 0.0969 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:07:11,980-Speed 3356.58 samples/sec Loss 25.9250 LearningRate 0.0969 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:07:15,023-Speed 3365.91 samples/sec Loss 26.0700 LearningRate 0.0969 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:07:18,111-Speed 3316.79 samples/sec Loss 25.8585 LearningRate 0.0969 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:07:21,153-Speed 3367.26 samples/sec Loss 25.8584 LearningRate 0.0969 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:07:24,202-Speed 3360.10 samples/sec Loss 25.7017 LearningRate 0.0968 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:07:27,273-Speed 3335.33 samples/sec Loss 25.7954 LearningRate 0.0968 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:07:30,305-Speed 3377.91 samples/sec Loss 25.5993 LearningRate 0.0968 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:07:33,381-Speed 3329.82 samples/sec Loss 25.5810 LearningRate 0.0968 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:07:36,473-Speed 3313.04 samples/sec Loss 25.5845 LearningRate 0.0968 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:07:39,531-Speed 3349.89 samples/sec Loss 25.6394 LearningRate 0.0968 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:07:42,569-Speed 3371.56 samples/sec Loss 25.3375 LearningRate 0.0968 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:07:45,601-Speed 3379.65 samples/sec Loss 25.4432 LearningRate 0.0968 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:07:48,617-Speed 3396.12 samples/sec Loss 25.4649 LearningRate 0.0968 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:07:51,650-Speed 3377.80 samples/sec Loss 25.3042 LearningRate 0.0968 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:07:54,743-Speed 3311.19 samples/sec Loss 25.2423 LearningRate 0.0968 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:07:57,768-Speed 3387.17 samples/sec Loss 25.1762 LearningRate 0.0968 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:08:00,816-Speed 3359.98 samples/sec Loss 25.2850 LearningRate 0.0968 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:08:03,904-Speed 3317.31 samples/sec Loss 25.1650 LearningRate 0.0967 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:08:06,998-Speed 3310.84 samples/sec Loss 25.0991 LearningRate 0.0967 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:08:10,038-Speed 3369.67 samples/sec Loss 25.1216 LearningRate 0.0967 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:08:13,073-Speed 3375.19 samples/sec Loss 25.0646 LearningRate 0.0967 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:08:16,176-Speed 3301.06 samples/sec Loss 24.9888 LearningRate 0.0967 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:08:19,233-Speed 3350.81 samples/sec Loss 24.7987 LearningRate 0.0967 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:08:22,277-Speed 3364.56 samples/sec Loss 24.8689 LearningRate 0.0967 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:08:25,327-Speed 3358.56 samples/sec Loss 24.9560 LearningRate 0.0967 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:08:28,351-Speed 3386.80 samples/sec Loss 24.9713 LearningRate 0.0967 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:08:31,382-Speed 3380.49 samples/sec Loss 24.8111 LearningRate 0.0967 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:08:34,394-Speed 3400.65 samples/sec Loss 24.7948 LearningRate 0.0967 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:08:37,420-Speed 3385.11 samples/sec Loss 24.6660 LearningRate 0.0967 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:08:40,512-Speed 3312.79 samples/sec Loss 24.5865 LearningRate 0.0966 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:08:43,609-Speed 3308.19 samples/sec Loss 24.5190 LearningRate 0.0966 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:08:46,620-Speed 3401.68 samples/sec Loss 24.4876 LearningRate 0.0966 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:08:49,653-Speed 3377.02 samples/sec Loss 24.6122 LearningRate 0.0966 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:08:52,717-Speed 3343.24 samples/sec Loss 24.5432 LearningRate 0.0966 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:08:55,830-Speed 3290.15 samples/sec Loss 24.4519 LearningRate 0.0966 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:08:58,877-Speed 3361.71 samples/sec Loss 24.5185 LearningRate 0.0966 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:09:01,929-Speed 3356.38 samples/sec Loss 24.3330 LearningRate 0.0966 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:09:04,929-Speed 3414.17 samples/sec Loss 24.2076 LearningRate 0.0966 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:07,952-Speed 3389.16 samples/sec Loss 24.2050 LearningRate 0.0966 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:10,987-Speed 3374.93 samples/sec Loss 24.3723 LearningRate 0.0966 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:14,013-Speed 3384.67 samples/sec Loss 24.2457 LearningRate 0.0966 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:17,131-Speed 3286.01 samples/sec Loss 24.1929 LearningRate 0.0966 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:20,134-Speed 3410.37 samples/sec Loss 24.1728 LearningRate 0.0965 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:23,178-Speed 3365.05 samples/sec Loss 24.2739 LearningRate 0.0965 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:26,232-Speed 3354.19 samples/sec Loss 24.1332 LearningRate 0.0965 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:29,246-Speed 3398.82 samples/sec Loss 23.9141 LearningRate 0.0965 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:32,336-Speed 3314.56 samples/sec Loss 24.0444 LearningRate 0.0965 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:35,343-Speed 3406.65 samples/sec Loss 24.0029 LearningRate 0.0965 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:38,376-Speed 3377.12 samples/sec Loss 23.8986 LearningRate 0.0965 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:41,432-Speed 3351.72 samples/sec Loss 23.9669 LearningRate 0.0965 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:44,471-Speed 3370.64 samples/sec Loss 24.0491 LearningRate 0.0965 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:47,528-Speed 3351.28 samples/sec Loss 23.7567 LearningRate 0.0965 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:50,624-Speed 3308.44 samples/sec Loss 23.7553 LearningRate 0.0965 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:53,760-Speed 3266.52 samples/sec Loss 23.7109 LearningRate 0.0965 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:56,802-Speed 3366.41 samples/sec Loss 23.6292 LearningRate 0.0964 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:09:59,885-Speed 3323.34 samples/sec Loss 23.6186 LearningRate 0.0964 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:02,925-Speed 3368.98 samples/sec Loss 23.6677 LearningRate 0.0964 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:05,975-Speed 3359.17 samples/sec Loss 23.6265 LearningRate 0.0964 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:08,990-Speed 3396.13 samples/sec Loss 23.7359 LearningRate 0.0964 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:12,043-Speed 3356.54 samples/sec Loss 23.4945 LearningRate 0.0964 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:15,046-Speed 3410.57 samples/sec Loss 23.4686 LearningRate 0.0964 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:18,123-Speed 3329.06 samples/sec Loss 23.5193 LearningRate 0.0964 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:21,165-Speed 3367.14 samples/sec Loss 23.4704 LearningRate 0.0964 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:24,199-Speed 3376.28 samples/sec Loss 23.3606 LearningRate 0.0964 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:27,260-Speed 3346.17 samples/sec Loss 23.3067 LearningRate 0.0964 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:30,358-Speed 3306.29 samples/sec Loss 23.1413 LearningRate 0.0964 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:33,381-Speed 3388.48 samples/sec Loss 23.4764 LearningRate 0.0964 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:36,507-Speed 3277.13 samples/sec Loss 23.1061 LearningRate 0.0963 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:10:39,618-Speed 3292.84 samples/sec Loss 23.3104 LearningRate 0.0963 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:42,638-Speed 3391.00 samples/sec Loss 23.0377 LearningRate 0.0963 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:45,682-Speed 3365.51 samples/sec Loss 23.1743 LearningRate 0.0963 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:10:48,753-Speed 3335.59 samples/sec Loss 23.0194 LearningRate 0.0963 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:10:51,803-Speed 3358.63 samples/sec Loss 23.1083 LearningRate 0.0963 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:10:54,811-Speed 3404.38 samples/sec Loss 22.9668 LearningRate 0.0963 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:10:57,831-Speed 3392.04 samples/sec Loss 22.9742 LearningRate 0.0963 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:11:00,876-Speed 3364.48 samples/sec Loss 22.9980 LearningRate 0.0963 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:11:03,907-Speed 3379.37 samples/sec Loss 22.6973 LearningRate 0.0963 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:11:06,930-Speed 3387.56 samples/sec Loss 22.8743 LearningRate 0.0963 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:11:09,952-Speed 3390.39 samples/sec Loss 22.7297 LearningRate 0.0963 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:11:12,976-Speed 3387.22 samples/sec Loss 22.7508 LearningRate 0.0963 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:11:15,997-Speed 3390.42 samples/sec Loss 22.6745 LearningRate 0.0962 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:11:19,008-Speed 3402.00 samples/sec Loss 22.5501 LearningRate 0.0962 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:22,015-Speed 3406.03 samples/sec Loss 22.6632 LearningRate 0.0962 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:25,077-Speed 3346.03 samples/sec Loss 22.7938 LearningRate 0.0962 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:28,121-Speed 3364.91 samples/sec Loss 22.7427 LearningRate 0.0962 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:31,152-Speed 3379.24 samples/sec Loss 22.4697 LearningRate 0.0962 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:34,156-Speed 3409.21 samples/sec Loss 22.5401 LearningRate 0.0962 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:37,184-Speed 3383.61 samples/sec Loss 22.4035 LearningRate 0.0962 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:40,197-Speed 3399.29 samples/sec Loss 22.4862 LearningRate 0.0962 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:43,208-Speed 3401.57 samples/sec Loss 22.4427 LearningRate 0.0962 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:46,214-Speed 3407.86 samples/sec Loss 22.4753 LearningRate 0.0962 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:49,219-Speed 3409.28 samples/sec Loss 22.2820 LearningRate 0.0962 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:11:52,250-Speed 3379.22 samples/sec Loss 22.3884 LearningRate 0.0961 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:55,249-Speed 3415.74 samples/sec Loss 22.2711 LearningRate 0.0961 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:11:58,306-Speed 3350.68 samples/sec Loss 22.3121 LearningRate 0.0961 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:01,375-Speed 3337.95 samples/sec Loss 22.1146 LearningRate 0.0961 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:04,442-Speed 3340.09 samples/sec Loss 22.1636 LearningRate 0.0961 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:07,515-Speed 3332.55 samples/sec Loss 22.1368 LearningRate 0.0961 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:10,513-Speed 3416.91 samples/sec Loss 21.8689 LearningRate 0.0961 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:13,533-Speed 3392.98 samples/sec Loss 22.1743 LearningRate 0.0961 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:16,549-Speed 3396.36 samples/sec Loss 21.8873 LearningRate 0.0961 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:19,566-Speed 3394.45 samples/sec Loss 21.9462 LearningRate 0.0961 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:22,585-Speed 3393.27 samples/sec Loss 21.8770 LearningRate 0.0961 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:12:25,598-Speed 3399.90 samples/sec Loss 21.8689 LearningRate 0.0961 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:28,629-Speed 3379.60 samples/sec Loss 21.8521 LearningRate 0.0961 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:31,689-Speed 3347.72 samples/sec Loss 21.8918 LearningRate 0.0960 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:34,692-Speed 3411.25 samples/sec Loss 21.7984 LearningRate 0.0960 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:37,721-Speed 3381.28 samples/sec Loss 21.8414 LearningRate 0.0960 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:40,768-Speed 3361.78 samples/sec Loss 21.9558 LearningRate 0.0960 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:43,835-Speed 3340.07 samples/sec Loss 21.8070 LearningRate 0.0960 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:46,881-Speed 3363.37 samples/sec Loss 21.7039 LearningRate 0.0960 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:49,889-Speed 3404.57 samples/sec Loss 21.6914 LearningRate 0.0960 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:52,942-Speed 3354.92 samples/sec Loss 21.7416 LearningRate 0.0960 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:55,975-Speed 3377.77 samples/sec Loss 21.8275 LearningRate 0.0960 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:12:59,001-Speed 3385.06 samples/sec Loss 21.4931 LearningRate 0.0960 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:13:02,051-Speed 3359.30 samples/sec Loss 21.7104 LearningRate 0.0960 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:13:05,152-Speed 3302.44 samples/sec Loss 21.3495 LearningRate 0.0960 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:13:08,234-Speed 3323.71 samples/sec Loss 21.4110 LearningRate 0.0960 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:13:11,301-Speed 3340.59 samples/sec Loss 21.5847 LearningRate 0.0959 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:13:14,327-Speed 3385.08 samples/sec Loss 21.4539 LearningRate 0.0959 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:13:17,343-Speed 3396.58 samples/sec Loss 21.3870 LearningRate 0.0959 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:13:20,385-Speed 3367.15 samples/sec Loss 21.4449 LearningRate 0.0959 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:13:23,417-Speed 3377.78 samples/sec Loss 21.2547 LearningRate 0.0959 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:13:26,439-Speed 3389.76 samples/sec Loss 21.2531 LearningRate 0.0959 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:13:29,463-Speed 3386.93 samples/sec Loss 21.2555 LearningRate 0.0959 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:13:32,489-Speed 3385.42 samples/sec Loss 21.3051 LearningRate 0.0959 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:13:35,529-Speed 3369.66 samples/sec Loss 21.1990 LearningRate 0.0959 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:13:38,591-Speed 3345.48 samples/sec Loss 21.0107 LearningRate 0.0959 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:13:41,690-Speed 3305.42 samples/sec Loss 21.0918 LearningRate 0.0959 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:13:44,697-Speed 3405.50 samples/sec Loss 20.8657 LearningRate 0.0959 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:13:47,770-Speed 3334.13 samples/sec Loss 20.9328 LearningRate 0.0958 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:13:50,798-Speed 3382.53 samples/sec Loss 20.9787 LearningRate 0.0958 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:13:53,954-Speed 3245.95 samples/sec Loss 21.0067 LearningRate 0.0958 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:13:57,006-Speed 3355.87 samples/sec Loss 21.0497 LearningRate 0.0958 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:00,106-Speed 3304.84 samples/sec Loss 20.9729 LearningRate 0.0958 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:03,128-Speed 3389.40 samples/sec Loss 21.0634 LearningRate 0.0958 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:06,249-Speed 3281.63 samples/sec Loss 20.9382 LearningRate 0.0958 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:09,279-Speed 3381.37 samples/sec Loss 20.8344 LearningRate 0.0958 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:14:12,315-Speed 3374.14 samples/sec Loss 20.7467 LearningRate 0.0958 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:15,334-Speed 3392.41 samples/sec Loss 20.7336 LearningRate 0.0958 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:18,359-Speed 3385.91 samples/sec Loss 20.9109 LearningRate 0.0958 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:21,370-Speed 3402.08 samples/sec Loss 20.6746 LearningRate 0.0958 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:24,376-Speed 3409.39 samples/sec Loss 20.4640 LearningRate 0.0958 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:27,450-Speed 3331.69 samples/sec Loss 20.7739 LearningRate 0.0957 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:30,489-Speed 3371.23 samples/sec Loss 20.7419 LearningRate 0.0957 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:33,491-Speed 3412.39 samples/sec Loss 20.7129 LearningRate 0.0957 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:36,551-Speed 3347.68 samples/sec Loss 20.6918 LearningRate 0.0957 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:39,644-Speed 3311.22 samples/sec Loss 20.6485 LearningRate 0.0957 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:42,662-Speed 3394.67 samples/sec Loss 20.4216 LearningRate 0.0957 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:45,663-Speed 3412.31 samples/sec Loss 20.5417 LearningRate 0.0957 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:48,680-Speed 3396.14 samples/sec Loss 20.4340 LearningRate 0.0957 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:14:51,720-Speed 3369.91 samples/sec Loss 20.2416 LearningRate 0.0957 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:14:54,736-Speed 3396.22 samples/sec Loss 20.3339 LearningRate 0.0957 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:14:57,764-Speed 3382.44 samples/sec Loss 20.3454 LearningRate 0.0957 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:00,808-Speed 3365.46 samples/sec Loss 20.3512 LearningRate 0.0957 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:03,812-Speed 3409.50 samples/sec Loss 20.2448 LearningRate 0.0957 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:06,837-Speed 3385.97 samples/sec Loss 20.2229 LearningRate 0.0956 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:09,867-Speed 3381.15 samples/sec Loss 20.2804 LearningRate 0.0956 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:12,925-Speed 3349.13 samples/sec Loss 20.2078 LearningRate 0.0956 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:15,973-Speed 3360.95 samples/sec Loss 20.3010 LearningRate 0.0956 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:19,004-Speed 3379.56 samples/sec Loss 20.2488 LearningRate 0.0956 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:22,023-Speed 3392.34 samples/sec Loss 20.2289 LearningRate 0.0956 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:15:25,047-Speed 3387.44 samples/sec Loss 20.1103 LearningRate 0.0956 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:28,118-Speed 3335.95 samples/sec Loss 20.2316 LearningRate 0.0956 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:31,155-Speed 3372.23 samples/sec Loss 20.2224 LearningRate 0.0956 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:34,240-Speed 3321.56 samples/sec Loss 20.2049 LearningRate 0.0956 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:37,313-Speed 3332.40 samples/sec Loss 20.0200 LearningRate 0.0956 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:40,379-Speed 3341.36 samples/sec Loss 19.9634 LearningRate 0.0956 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:43,457-Speed 3327.35 samples/sec Loss 20.0507 LearningRate 0.0956 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:46,508-Speed 3357.85 samples/sec Loss 20.0461 LearningRate 0.0955 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:49,567-Speed 3348.68 samples/sec Loss 19.9137 LearningRate 0.0955 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:52,652-Speed 3320.42 samples/sec Loss 19.8032 LearningRate 0.0955 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:55,730-Speed 3327.14 samples/sec Loss 19.8615 LearningRate 0.0955 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:15:58,766-Speed 3373.92 samples/sec Loss 19.8116 LearningRate 0.0955 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:01,825-Speed 3348.55 samples/sec Loss 19.8653 LearningRate 0.0955 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:04,939-Speed 3290.28 samples/sec Loss 19.6968 LearningRate 0.0955 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:07,971-Speed 3378.46 samples/sec Loss 19.6801 LearningRate 0.0955 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:10,989-Speed 3393.93 samples/sec Loss 19.6700 LearningRate 0.0955 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:14,001-Speed 3401.03 samples/sec Loss 19.8872 LearningRate 0.0955 Epoch: 0 Global Step: 5690 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:17,148-Speed 3254.26 samples/sec Loss 19.7362 LearningRate 0.0955 Epoch: 0 Global Step: 5700 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:20,176-Speed 3382.79 samples/sec Loss 19.5734 LearningRate 0.0955 Epoch: 0 Global Step: 5710 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:23,179-Speed 3411.44 samples/sec Loss 19.6703 LearningRate 0.0954 Epoch: 0 Global Step: 5720 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:26,322-Speed 3259.61 samples/sec Loss 19.4415 LearningRate 0.0954 Epoch: 0 Global Step: 5730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:16:29,405-Speed 3321.73 samples/sec Loss 19.6119 LearningRate 0.0954 Epoch: 0 Global Step: 5740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:16:32,474-Speed 3337.92 samples/sec Loss 19.4744 LearningRate 0.0954 Epoch: 0 Global Step: 5750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:16:35,539-Speed 3341.91 samples/sec Loss 19.4289 LearningRate 0.0954 Epoch: 0 Global Step: 5760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:16:38,626-Speed 3318.49 samples/sec Loss 19.4231 LearningRate 0.0954 Epoch: 0 Global Step: 5770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:16:41,710-Speed 3320.79 samples/sec Loss 19.5562 LearningRate 0.0954 Epoch: 0 Global Step: 5780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:16:44,745-Speed 3375.38 samples/sec Loss 19.3348 LearningRate 0.0954 Epoch: 0 Global Step: 5790 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:48,325-Speed 2861.23 samples/sec Loss 19.4014 LearningRate 0.0954 Epoch: 0 Global Step: 5800 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:51,393-Speed 3338.46 samples/sec Loss 19.4031 LearningRate 0.0954 Epoch: 0 Global Step: 5810 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:54,472-Speed 3327.30 samples/sec Loss 19.2788 LearningRate 0.0954 Epoch: 0 Global Step: 5820 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:16:57,485-Speed 3398.90 samples/sec Loss 19.3726 LearningRate 0.0954 Epoch: 0 Global Step: 5830 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:17:00,511-Speed 3385.13 samples/sec Loss 19.3568 LearningRate 0.0954 Epoch: 0 Global Step: 5840 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:17:03,524-Speed 3399.51 samples/sec Loss 19.3167 LearningRate 0.0953 Epoch: 0 Global Step: 5850 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:17:06,523-Speed 3416.63 samples/sec Loss 19.1248 LearningRate 0.0953 Epoch: 0 Global Step: 5860 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:17:09,522-Speed 3415.17 samples/sec Loss 19.2465 LearningRate 0.0953 Epoch: 0 Global Step: 5870 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:17:12,556-Speed 3376.26 samples/sec Loss 19.3398 LearningRate 0.0953 Epoch: 0 Global Step: 5880 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:17:15,659-Speed 3300.24 samples/sec Loss 19.2733 LearningRate 0.0953 Epoch: 0 Global Step: 5890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:18,668-Speed 3404.52 samples/sec Loss 19.1388 LearningRate 0.0953 Epoch: 0 Global Step: 5900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:21,673-Speed 3409.61 samples/sec Loss 18.9335 LearningRate 0.0953 Epoch: 0 Global Step: 5910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:24,702-Speed 3380.87 samples/sec Loss 18.9983 LearningRate 0.0953 Epoch: 0 Global Step: 5920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:27,720-Speed 3394.17 samples/sec Loss 19.3016 LearningRate 0.0953 Epoch: 0 Global Step: 5930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:30,777-Speed 3351.62 samples/sec Loss 19.0702 LearningRate 0.0953 Epoch: 0 Global Step: 5940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:33,792-Speed 3397.30 samples/sec Loss 19.2086 LearningRate 0.0953 Epoch: 0 Global Step: 5950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:36,902-Speed 3294.00 samples/sec Loss 19.0789 LearningRate 0.0953 Epoch: 0 Global Step: 5960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:40,004-Speed 3301.76 samples/sec Loss 18.9874 LearningRate 0.0953 Epoch: 0 Global Step: 5970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:43,021-Speed 3395.26 samples/sec Loss 18.9292 LearningRate 0.0952 Epoch: 0 Global Step: 5980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:46,018-Speed 3418.40 samples/sec Loss 18.7594 LearningRate 0.0952 Epoch: 0 Global Step: 5990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:49,073-Speed 3352.20 samples/sec Loss 18.8444 LearningRate 0.0952 Epoch: 0 Global Step: 6000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:52,103-Speed 3380.33 samples/sec Loss 18.8487 LearningRate 0.0952 Epoch: 0 Global Step: 6010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:55,190-Speed 3319.23 samples/sec Loss 18.7093 LearningRate 0.0952 Epoch: 0 Global Step: 6020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:17:58,173-Speed 3433.44 samples/sec Loss 18.8933 LearningRate 0.0952 Epoch: 0 Global Step: 6030 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:18:01,259-Speed 3319.40 samples/sec Loss 18.8089 LearningRate 0.0952 Epoch: 0 Global Step: 6040 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:18:04,305-Speed 3362.27 samples/sec Loss 18.7497 LearningRate 0.0952 Epoch: 0 Global Step: 6050 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:18:07,348-Speed 3366.95 samples/sec Loss 18.8858 LearningRate 0.0952 Epoch: 0 Global Step: 6060 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:18:10,353-Speed 3408.14 samples/sec Loss 18.7387 LearningRate 0.0952 Epoch: 0 Global Step: 6070 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:18:14,181-Speed 2675.54 samples/sec Loss 18.7094 LearningRate 0.0952 Epoch: 0 Global Step: 6080 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:18:17,194-Speed 3399.44 samples/sec Loss 18.6326 LearningRate 0.0952 Epoch: 0 Global Step: 6090 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:18:20,225-Speed 3379.66 samples/sec Loss 18.8026 LearningRate 0.0951 Epoch: 0 Global Step: 6100 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:18:24,779-Speed 2249.11 samples/sec Loss 18.5482 LearningRate 0.0951 Epoch: 0 Global Step: 6110 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:18:27,805-Speed 3385.24 samples/sec Loss 18.5365 LearningRate 0.0951 Epoch: 0 Global Step: 6120 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:18:30,836-Speed 3379.35 samples/sec Loss 18.6434 LearningRate 0.0951 Epoch: 0 Global Step: 6130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:18:33,866-Speed 3381.48 samples/sec Loss 18.5056 LearningRate 0.0951 Epoch: 0 Global Step: 6140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:18:36,891-Speed 3386.26 samples/sec Loss 18.4670 LearningRate 0.0951 Epoch: 0 Global Step: 6150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:18:39,971-Speed 3325.19 samples/sec Loss 18.4393 LearningRate 0.0951 Epoch: 0 Global Step: 6160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:18:42,993-Speed 3389.43 samples/sec Loss 18.4555 LearningRate 0.0951 Epoch: 0 Global Step: 6170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:18:46,072-Speed 3326.93 samples/sec Loss 18.2998 LearningRate 0.0951 Epoch: 0 Global Step: 6180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:18:49,116-Speed 3365.01 samples/sec Loss 18.3600 LearningRate 0.0951 Epoch: 0 Global Step: 6190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:18:52,166-Speed 3358.68 samples/sec Loss 18.3082 LearningRate 0.0951 Epoch: 0 Global Step: 6200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:18:55,211-Speed 3367.90 samples/sec Loss 18.3712 LearningRate 0.0951 Epoch: 0 Global Step: 6210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:18:58,285-Speed 3332.85 samples/sec Loss 18.3573 LearningRate 0.0951 Epoch: 0 Global Step: 6220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:19:01,405-Speed 3283.36 samples/sec Loss 18.3320 LearningRate 0.0950 Epoch: 0 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:19:04,466-Speed 3346.31 samples/sec Loss 18.3330 LearningRate 0.0950 Epoch: 0 Global Step: 6240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:19:07,511-Speed 3363.06 samples/sec Loss 18.1172 LearningRate 0.0950 Epoch: 0 Global Step: 6250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:19:10,538-Speed 3383.92 samples/sec Loss 18.3042 LearningRate 0.0950 Epoch: 0 Global Step: 6260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:19:13,649-Speed 3293.37 samples/sec Loss 18.2177 LearningRate 0.0950 Epoch: 0 Global Step: 6270 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:16,724-Speed 3331.15 samples/sec Loss 18.3747 LearningRate 0.0950 Epoch: 0 Global Step: 6280 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:19,784-Speed 3346.65 samples/sec Loss 18.1673 LearningRate 0.0950 Epoch: 0 Global Step: 6290 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:22,833-Speed 3359.50 samples/sec Loss 18.1631 LearningRate 0.0950 Epoch: 0 Global Step: 6300 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:25,874-Speed 3369.04 samples/sec Loss 18.0954 LearningRate 0.0950 Epoch: 0 Global Step: 6310 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:28,976-Speed 3302.27 samples/sec Loss 17.9901 LearningRate 0.0950 Epoch: 0 Global Step: 6320 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:32,009-Speed 3377.04 samples/sec Loss 18.0002 LearningRate 0.0950 Epoch: 0 Global Step: 6330 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:35,060-Speed 3357.05 samples/sec Loss 17.9670 LearningRate 0.0950 Epoch: 0 Global Step: 6340 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:38,114-Speed 3354.17 samples/sec Loss 18.0064 LearningRate 0.0950 Epoch: 0 Global Step: 6350 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:41,133-Speed 3392.85 samples/sec Loss 17.9806 LearningRate 0.0949 Epoch: 0 Global Step: 6360 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:44,148-Speed 3398.12 samples/sec Loss 17.9624 LearningRate 0.0949 Epoch: 0 Global Step: 6370 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:47,180-Speed 3378.45 samples/sec Loss 17.8904 LearningRate 0.0949 Epoch: 0 Global Step: 6380 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:50,181-Speed 3412.79 samples/sec Loss 17.7599 LearningRate 0.0949 Epoch: 0 Global Step: 6390 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:53,182-Speed 3412.92 samples/sec Loss 17.8070 LearningRate 0.0949 Epoch: 0 Global Step: 6400 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:56,220-Speed 3371.81 samples/sec Loss 17.9383 LearningRate 0.0949 Epoch: 0 Global Step: 6410 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:19:59,286-Speed 3341.35 samples/sec Loss 17.9659 LearningRate 0.0949 Epoch: 0 Global Step: 6420 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:20:02,352-Speed 3340.95 samples/sec Loss 17.8814 LearningRate 0.0949 Epoch: 0 Global Step: 6430 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:20:05,371-Speed 3393.48 samples/sec Loss 17.8385 LearningRate 0.0949 Epoch: 0 Global Step: 6440 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:20:08,438-Speed 3339.06 samples/sec Loss 17.7980 LearningRate 0.0949 Epoch: 0 Global Step: 6450 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:20:11,472-Speed 3377.12 samples/sec Loss 17.9913 LearningRate 0.0949 Epoch: 0 Global Step: 6460 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:20:14,571-Speed 3304.92 samples/sec Loss 17.8704 LearningRate 0.0949 Epoch: 0 Global Step: 6470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:17,607-Speed 3374.17 samples/sec Loss 17.8592 LearningRate 0.0949 Epoch: 0 Global Step: 6480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:20,636-Speed 3381.23 samples/sec Loss 17.5519 LearningRate 0.0948 Epoch: 0 Global Step: 6490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:23,650-Speed 3398.62 samples/sec Loss 17.7118 LearningRate 0.0948 Epoch: 0 Global Step: 6500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:26,683-Speed 3377.95 samples/sec Loss 17.7970 LearningRate 0.0948 Epoch: 0 Global Step: 6510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:29,720-Speed 3371.99 samples/sec Loss 17.6537 LearningRate 0.0948 Epoch: 0 Global Step: 6520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:32,779-Speed 3349.74 samples/sec Loss 17.6085 LearningRate 0.0948 Epoch: 0 Global Step: 6530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:35,822-Speed 3365.45 samples/sec Loss 17.5641 LearningRate 0.0948 Epoch: 0 Global Step: 6540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:38,871-Speed 3359.57 samples/sec Loss 17.7718 LearningRate 0.0948 Epoch: 0 Global Step: 6550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:41,948-Speed 3329.34 samples/sec Loss 17.6572 LearningRate 0.0948 Epoch: 0 Global Step: 6560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:44,965-Speed 3394.73 samples/sec Loss 17.5290 LearningRate 0.0948 Epoch: 0 Global Step: 6570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:48,003-Speed 3371.62 samples/sec Loss 17.6317 LearningRate 0.0948 Epoch: 0 Global Step: 6580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:51,048-Speed 3363.99 samples/sec Loss 17.5668 LearningRate 0.0948 Epoch: 0 Global Step: 6590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:54,155-Speed 3297.20 samples/sec Loss 17.5147 LearningRate 0.0948 Epoch: 0 Global Step: 6600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:20:57,240-Speed 3319.36 samples/sec Loss 17.5341 LearningRate 0.0947 Epoch: 0 Global Step: 6610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:21:00,292-Speed 3356.02 samples/sec Loss 17.5568 LearningRate 0.0947 Epoch: 0 Global Step: 6620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:21:03,377-Speed 3321.35 samples/sec Loss 17.6925 LearningRate 0.0947 Epoch: 0 Global Step: 6630 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:06,452-Speed 3330.93 samples/sec Loss 17.3901 LearningRate 0.0947 Epoch: 0 Global Step: 6640 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:09,447-Speed 3420.07 samples/sec Loss 17.4746 LearningRate 0.0947 Epoch: 0 Global Step: 6650 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:12,484-Speed 3373.06 samples/sec Loss 17.5558 LearningRate 0.0947 Epoch: 0 Global Step: 6660 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:15,545-Speed 3346.39 samples/sec Loss 17.5045 LearningRate 0.0947 Epoch: 0 Global Step: 6670 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:18,594-Speed 3359.15 samples/sec Loss 17.5088 LearningRate 0.0947 Epoch: 0 Global Step: 6680 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:21,625-Speed 3379.32 samples/sec Loss 17.5500 LearningRate 0.0947 Epoch: 0 Global Step: 6690 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:24,675-Speed 3359.02 samples/sec Loss 17.3938 LearningRate 0.0947 Epoch: 0 Global Step: 6700 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:27,710-Speed 3375.24 samples/sec Loss 17.2365 LearningRate 0.0947 Epoch: 0 Global Step: 6710 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:30,746-Speed 3373.55 samples/sec Loss 17.3121 LearningRate 0.0947 Epoch: 0 Global Step: 6720 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:33,776-Speed 3380.97 samples/sec Loss 17.3580 LearningRate 0.0947 Epoch: 0 Global Step: 6730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:21:36,814-Speed 3371.91 samples/sec Loss 17.2010 LearningRate 0.0946 Epoch: 0 Global Step: 6740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:21:39,822-Speed 3405.00 samples/sec Loss 17.3129 LearningRate 0.0946 Epoch: 0 Global Step: 6750 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:42,882-Speed 3347.61 samples/sec Loss 17.2898 LearningRate 0.0946 Epoch: 0 Global Step: 6760 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:45,890-Speed 3405.55 samples/sec Loss 17.2959 LearningRate 0.0946 Epoch: 0 Global Step: 6770 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:48,910-Speed 3391.93 samples/sec Loss 17.0935 LearningRate 0.0946 Epoch: 0 Global Step: 6780 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:51,944-Speed 3376.32 samples/sec Loss 17.3204 LearningRate 0.0946 Epoch: 0 Global Step: 6790 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:54,974-Speed 3379.93 samples/sec Loss 17.1124 LearningRate 0.0946 Epoch: 0 Global Step: 6800 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:21:58,000-Speed 3386.01 samples/sec Loss 17.2942 LearningRate 0.0946 Epoch: 0 Global Step: 6810 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:22:01,030-Speed 3380.20 samples/sec Loss 17.1193 LearningRate 0.0946 Epoch: 0 Global Step: 6820 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:22:04,092-Speed 3345.29 samples/sec Loss 17.0106 LearningRate 0.0946 Epoch: 0 Global Step: 6830 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:22:07,098-Speed 3407.78 samples/sec Loss 17.1215 LearningRate 0.0946 Epoch: 0 Global Step: 6840 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:22:10,104-Speed 3408.13 samples/sec Loss 17.0658 LearningRate 0.0946 Epoch: 0 Global Step: 6850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:22:13,182-Speed 3327.56 samples/sec Loss 17.2164 LearningRate 0.0946 Epoch: 0 Global Step: 6860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:22:16,257-Speed 3331.89 samples/sec Loss 17.1545 LearningRate 0.0945 Epoch: 0 Global Step: 6870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:22:19,258-Speed 3412.74 samples/sec Loss 17.0109 LearningRate 0.0945 Epoch: 0 Global Step: 6880 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-27 02:22:22,276-Speed 3394.02 samples/sec Loss 16.9847 LearningRate 0.0945 Epoch: 0 Global Step: 6890 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-27 02:22:25,304-Speed 3383.53 samples/sec Loss 17.1891 LearningRate 0.0945 Epoch: 0 Global Step: 6900 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-27 02:22:28,327-Speed 3388.68 samples/sec Loss 16.9376 LearningRate 0.0945 Epoch: 0 Global Step: 6910 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-27 02:22:31,353-Speed 3384.32 samples/sec Loss 16.9385 LearningRate 0.0945 Epoch: 0 Global Step: 6920 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-27 02:22:34,405-Speed 3357.44 samples/sec Loss 16.9332 LearningRate 0.0945 Epoch: 0 Global Step: 6930 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-27 02:22:37,448-Speed 3365.51 samples/sec Loss 17.0098 LearningRate 0.0945 Epoch: 0 Global Step: 6940 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-27 02:22:40,530-Speed 3324.34 samples/sec Loss 17.0161 LearningRate 0.0945 Epoch: 0 Global Step: 6950 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-27 02:22:43,585-Speed 3352.43 samples/sec Loss 16.7965 LearningRate 0.0945 Epoch: 0 Global Step: 6960 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-27 02:22:46,621-Speed 3373.35 samples/sec Loss 16.8241 LearningRate 0.0945 Epoch: 0 Global Step: 6970 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-04-27 02:22:49,641-Speed 3391.71 samples/sec Loss 16.6479 LearningRate 0.0945 Epoch: 0 Global Step: 6980 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:22:52,709-Speed 3339.25 samples/sec Loss 16.9037 LearningRate 0.0945 Epoch: 0 Global Step: 6990 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:22:55,739-Speed 3380.13 samples/sec Loss 16.6872 LearningRate 0.0944 Epoch: 0 Global Step: 7000 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:22:58,766-Speed 3384.27 samples/sec Loss 17.0340 LearningRate 0.0944 Epoch: 0 Global Step: 7010 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:01,789-Speed 3389.34 samples/sec Loss 16.7981 LearningRate 0.0944 Epoch: 0 Global Step: 7020 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:04,832-Speed 3365.63 samples/sec Loss 16.6041 LearningRate 0.0944 Epoch: 0 Global Step: 7030 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:07,839-Speed 3406.02 samples/sec Loss 16.6277 LearningRate 0.0944 Epoch: 0 Global Step: 7040 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:10,885-Speed 3363.70 samples/sec Loss 16.6833 LearningRate 0.0944 Epoch: 0 Global Step: 7050 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:13,897-Speed 3400.15 samples/sec Loss 16.6205 LearningRate 0.0944 Epoch: 0 Global Step: 7060 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:16,923-Speed 3384.93 samples/sec Loss 16.5948 LearningRate 0.0944 Epoch: 0 Global Step: 7070 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:19,939-Speed 3396.95 samples/sec Loss 16.7893 LearningRate 0.0944 Epoch: 0 Global Step: 7080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:23:22,946-Speed 3406.54 samples/sec Loss 16.7299 LearningRate 0.0944 Epoch: 0 Global Step: 7090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:23:25,974-Speed 3382.90 samples/sec Loss 16.6761 LearningRate 0.0944 Epoch: 0 Global Step: 7100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:23:29,011-Speed 3372.23 samples/sec Loss 16.4482 LearningRate 0.0944 Epoch: 0 Global Step: 7110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:23:32,009-Speed 3417.07 samples/sec Loss 16.6936 LearningRate 0.0943 Epoch: 0 Global Step: 7120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:23:35,015-Speed 3407.35 samples/sec Loss 16.4689 LearningRate 0.0943 Epoch: 0 Global Step: 7130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:23:38,015-Speed 3413.86 samples/sec Loss 16.4481 LearningRate 0.0943 Epoch: 0 Global Step: 7140 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:41,101-Speed 3319.83 samples/sec Loss 16.5386 LearningRate 0.0943 Epoch: 0 Global Step: 7150 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:44,098-Speed 3417.30 samples/sec Loss 16.5728 LearningRate 0.0943 Epoch: 0 Global Step: 7160 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:47,132-Speed 3376.84 samples/sec Loss 16.5424 LearningRate 0.0943 Epoch: 0 Global Step: 7170 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:50,159-Speed 3384.08 samples/sec Loss 16.5434 LearningRate 0.0943 Epoch: 0 Global Step: 7180 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:53,207-Speed 3359.97 samples/sec Loss 16.4116 LearningRate 0.0943 Epoch: 0 Global Step: 7190 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:56,223-Speed 3396.57 samples/sec Loss 16.4286 LearningRate 0.0943 Epoch: 0 Global Step: 7200 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:23:59,235-Speed 3400.99 samples/sec Loss 16.4632 LearningRate 0.0943 Epoch: 0 Global Step: 7210 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:02,260-Speed 3385.66 samples/sec Loss 16.3344 LearningRate 0.0943 Epoch: 0 Global Step: 7220 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:05,277-Speed 3394.94 samples/sec Loss 16.3463 LearningRate 0.0943 Epoch: 0 Global Step: 7230 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:08,272-Speed 3420.89 samples/sec Loss 16.4364 LearningRate 0.0943 Epoch: 0 Global Step: 7240 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:11,314-Speed 3366.89 samples/sec Loss 16.2549 LearningRate 0.0942 Epoch: 0 Global Step: 7250 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:14,332-Speed 3394.95 samples/sec Loss 16.2656 LearningRate 0.0942 Epoch: 0 Global Step: 7260 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:17,341-Speed 3403.97 samples/sec Loss 16.3522 LearningRate 0.0942 Epoch: 0 Global Step: 7270 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:20,348-Speed 3406.47 samples/sec Loss 16.1795 LearningRate 0.0942 Epoch: 0 Global Step: 7280 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:23,469-Speed 3282.38 samples/sec Loss 16.2808 LearningRate 0.0942 Epoch: 0 Global Step: 7290 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:26,511-Speed 3366.89 samples/sec Loss 16.1559 LearningRate 0.0942 Epoch: 0 Global Step: 7300 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:29,512-Speed 3413.64 samples/sec Loss 16.2610 LearningRate 0.0942 Epoch: 0 Global Step: 7310 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:32,525-Speed 3399.17 samples/sec Loss 16.1598 LearningRate 0.0942 Epoch: 0 Global Step: 7320 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:35,536-Speed 3401.96 samples/sec Loss 16.1316 LearningRate 0.0942 Epoch: 0 Global Step: 7330 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:24:38,558-Speed 3389.54 samples/sec Loss 16.0974 LearningRate 0.0942 Epoch: 0 Global Step: 7340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:24:41,630-Speed 3334.90 samples/sec Loss 16.0439 LearningRate 0.0942 Epoch: 0 Global Step: 7350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:24:44,669-Speed 3371.20 samples/sec Loss 16.2218 LearningRate 0.0942 Epoch: 0 Global Step: 7360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:24:47,678-Speed 3403.44 samples/sec Loss 16.0729 LearningRate 0.0942 Epoch: 0 Global Step: 7370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:24:50,695-Speed 3395.93 samples/sec Loss 16.1185 LearningRate 0.0941 Epoch: 0 Global Step: 7380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:24:53,750-Speed 3351.79 samples/sec Loss 15.9999 LearningRate 0.0941 Epoch: 0 Global Step: 7390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:24:56,784-Speed 3376.45 samples/sec Loss 16.0914 LearningRate 0.0941 Epoch: 0 Global Step: 7400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:24:59,810-Speed 3385.48 samples/sec Loss 16.1229 LearningRate 0.0941 Epoch: 0 Global Step: 7410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:25:02,832-Speed 3389.86 samples/sec Loss 15.9073 LearningRate 0.0941 Epoch: 0 Global Step: 7420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:25:05,886-Speed 3353.74 samples/sec Loss 16.0265 LearningRate 0.0941 Epoch: 0 Global Step: 7430 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:08,897-Speed 3402.31 samples/sec Loss 15.8871 LearningRate 0.0941 Epoch: 0 Global Step: 7440 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:11,938-Speed 3368.37 samples/sec Loss 15.9589 LearningRate 0.0941 Epoch: 0 Global Step: 7450 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:14,980-Speed 3367.49 samples/sec Loss 15.8745 LearningRate 0.0941 Epoch: 0 Global Step: 7460 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:18,054-Speed 3332.29 samples/sec Loss 15.9191 LearningRate 0.0941 Epoch: 0 Global Step: 7470 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:21,065-Speed 3401.20 samples/sec Loss 16.0280 LearningRate 0.0941 Epoch: 0 Global Step: 7480 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:24,138-Speed 3333.18 samples/sec Loss 15.9953 LearningRate 0.0941 Epoch: 0 Global Step: 7490 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:27,199-Speed 3346.65 samples/sec Loss 15.8297 LearningRate 0.0941 Epoch: 0 Global Step: 7500 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:30,218-Speed 3393.08 samples/sec Loss 15.7890 LearningRate 0.0940 Epoch: 0 Global Step: 7510 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:33,251-Speed 3377.38 samples/sec Loss 15.9123 LearningRate 0.0940 Epoch: 0 Global Step: 7520 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:36,353-Speed 3301.70 samples/sec Loss 15.8976 LearningRate 0.0940 Epoch: 0 Global Step: 7530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:25:39,430-Speed 3329.32 samples/sec Loss 15.8619 LearningRate 0.0940 Epoch: 0 Global Step: 7540 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:42,471-Speed 3367.66 samples/sec Loss 15.8882 LearningRate 0.0940 Epoch: 0 Global Step: 7550 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:45,491-Speed 3392.04 samples/sec Loss 15.8700 LearningRate 0.0940 Epoch: 0 Global Step: 7560 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:48,563-Speed 3334.89 samples/sec Loss 15.8580 LearningRate 0.0940 Epoch: 0 Global Step: 7570 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:51,704-Speed 3260.81 samples/sec Loss 15.9012 LearningRate 0.0940 Epoch: 0 Global Step: 7580 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:54,764-Speed 3348.18 samples/sec Loss 16.0402 LearningRate 0.0940 Epoch: 0 Global Step: 7590 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:25:57,800-Speed 3374.07 samples/sec Loss 15.9507 LearningRate 0.0940 Epoch: 0 Global Step: 7600 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:26:00,913-Speed 3291.07 samples/sec Loss 15.8372 LearningRate 0.0940 Epoch: 0 Global Step: 7610 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:26:04,012-Speed 3305.14 samples/sec Loss 15.7950 LearningRate 0.0940 Epoch: 0 Global Step: 7620 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:26:07,053-Speed 3369.31 samples/sec Loss 15.6979 LearningRate 0.0940 Epoch: 0 Global Step: 7630 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:26:10,092-Speed 3369.87 samples/sec Loss 15.6591 LearningRate 0.0939 Epoch: 0 Global Step: 7640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:13,145-Speed 3355.22 samples/sec Loss 15.7075 LearningRate 0.0939 Epoch: 0 Global Step: 7650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:16,186-Speed 3368.27 samples/sec Loss 15.6055 LearningRate 0.0939 Epoch: 0 Global Step: 7660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:19,240-Speed 3354.19 samples/sec Loss 15.7391 LearningRate 0.0939 Epoch: 0 Global Step: 7670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:22,250-Speed 3403.22 samples/sec Loss 15.8573 LearningRate 0.0939 Epoch: 0 Global Step: 7680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:25,407-Speed 3244.38 samples/sec Loss 15.7252 LearningRate 0.0939 Epoch: 0 Global Step: 7690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:28,464-Speed 3350.16 samples/sec Loss 15.6564 LearningRate 0.0939 Epoch: 0 Global Step: 7700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:31,526-Speed 3345.29 samples/sec Loss 15.7173 LearningRate 0.0939 Epoch: 0 Global Step: 7710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:34,564-Speed 3372.63 samples/sec Loss 15.6437 LearningRate 0.0939 Epoch: 0 Global Step: 7720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:37,655-Speed 3313.84 samples/sec Loss 15.6949 LearningRate 0.0939 Epoch: 0 Global Step: 7730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:40,758-Speed 3300.97 samples/sec Loss 15.6978 LearningRate 0.0939 Epoch: 0 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:26:43,790-Speed 3377.89 samples/sec Loss 15.8214 LearningRate 0.0939 Epoch: 0 Global Step: 7750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:46,867-Speed 3329.10 samples/sec Loss 15.5763 LearningRate 0.0939 Epoch: 0 Global Step: 7760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:49,897-Speed 3380.55 samples/sec Loss 15.5351 LearningRate 0.0938 Epoch: 0 Global Step: 7770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:52,972-Speed 3331.39 samples/sec Loss 15.4865 LearningRate 0.0938 Epoch: 0 Global Step: 7780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:56,020-Speed 3360.70 samples/sec Loss 15.4382 LearningRate 0.0938 Epoch: 0 Global Step: 7790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:26:59,065-Speed 3363.58 samples/sec Loss 15.4773 LearningRate 0.0938 Epoch: 0 Global Step: 7800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:02,157-Speed 3313.01 samples/sec Loss 15.4587 LearningRate 0.0938 Epoch: 0 Global Step: 7810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:05,202-Speed 3363.35 samples/sec Loss 15.5661 LearningRate 0.0938 Epoch: 0 Global Step: 7820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:08,221-Speed 3393.56 samples/sec Loss 15.4630 LearningRate 0.0938 Epoch: 0 Global Step: 7830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:11,257-Speed 3374.46 samples/sec Loss 15.3276 LearningRate 0.0938 Epoch: 0 Global Step: 7840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:14,272-Speed 3396.41 samples/sec Loss 15.4344 LearningRate 0.0938 Epoch: 0 Global Step: 7850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:27:17,297-Speed 3386.68 samples/sec Loss 15.3681 LearningRate 0.0938 Epoch: 0 Global Step: 7860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:20,335-Speed 3372.17 samples/sec Loss 15.3410 LearningRate 0.0938 Epoch: 0 Global Step: 7870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:23,397-Speed 3344.67 samples/sec Loss 15.4013 LearningRate 0.0938 Epoch: 0 Global Step: 7880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:26,425-Speed 3382.89 samples/sec Loss 15.5220 LearningRate 0.0937 Epoch: 0 Global Step: 7890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:29,473-Speed 3361.16 samples/sec Loss 15.3113 LearningRate 0.0937 Epoch: 0 Global Step: 7900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:32,496-Speed 3388.01 samples/sec Loss 15.4449 LearningRate 0.0937 Epoch: 0 Global Step: 7910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:35,569-Speed 3333.29 samples/sec Loss 15.4131 LearningRate 0.0937 Epoch: 0 Global Step: 7920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:38,604-Speed 3375.85 samples/sec Loss 15.2875 LearningRate 0.0937 Epoch: 0 Global Step: 7930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:41,639-Speed 3374.06 samples/sec Loss 15.3911 LearningRate 0.0937 Epoch: 0 Global Step: 7940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:44,659-Speed 3392.27 samples/sec Loss 15.1920 LearningRate 0.0937 Epoch: 0 Global Step: 7950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:27:47,680-Speed 3390.82 samples/sec Loss 15.2394 LearningRate 0.0937 Epoch: 0 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:27:50,717-Speed 3372.86 samples/sec Loss 15.2062 LearningRate 0.0937 Epoch: 0 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:27:53,746-Speed 3382.20 samples/sec Loss 15.1675 LearningRate 0.0937 Epoch: 0 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:27:56,813-Speed 3339.40 samples/sec Loss 15.4406 LearningRate 0.0937 Epoch: 0 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:27:59,900-Speed 3317.98 samples/sec Loss 15.1063 LearningRate 0.0937 Epoch: 0 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:28:03,034-Speed 3269.12 samples/sec Loss 15.2816 LearningRate 0.0937 Epoch: 0 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:28:06,103-Speed 3337.18 samples/sec Loss 15.0493 LearningRate 0.0936 Epoch: 0 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 02:28:09,145-Speed 3366.83 samples/sec Loss 15.1079 LearningRate 0.0936 Epoch: 0 Global Step: 8030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:28:12,193-Speed 3361.77 samples/sec Loss 15.0099 LearningRate 0.0936 Epoch: 0 Global Step: 8040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:28:15,213-Speed 3390.92 samples/sec Loss 15.0728 LearningRate 0.0936 Epoch: 0 Global Step: 8050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:28:18,275-Speed 3345.55 samples/sec Loss 15.2202 LearningRate 0.0936 Epoch: 0 Global Step: 8060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:28:21,328-Speed 3355.84 samples/sec Loss 15.0637 LearningRate 0.0936 Epoch: 0 Global Step: 8070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:28:24,399-Speed 3335.25 samples/sec Loss 15.2076 LearningRate 0.0936 Epoch: 0 Global Step: 8080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:28:27,444-Speed 3363.86 samples/sec Loss 15.1240 LearningRate 0.0936 Epoch: 0 Global Step: 8090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:28:30,572-Speed 3274.32 samples/sec Loss 15.0730 LearningRate 0.0936 Epoch: 0 Global Step: 8100 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:28:33,587-Speed 3397.72 samples/sec Loss 14.8852 LearningRate 0.0936 Epoch: 0 Global Step: 8110 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:28:36,667-Speed 3326.39 samples/sec Loss 14.9480 LearningRate 0.0936 Epoch: 0 Global Step: 8120 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:28:39,729-Speed 3345.14 samples/sec Loss 15.1346 LearningRate 0.0936 Epoch: 0 Global Step: 8130 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:28:42,776-Speed 3362.00 samples/sec Loss 15.1343 LearningRate 0.0936 Epoch: 0 Global Step: 8140 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:28:45,806-Speed 3380.48 samples/sec Loss 15.0379 LearningRate 0.0935 Epoch: 0 Global Step: 8150 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:28:48,871-Speed 3342.17 samples/sec Loss 14.8474 LearningRate 0.0935 Epoch: 0 Global Step: 8160 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:28:51,938-Speed 3340.41 samples/sec Loss 14.8912 LearningRate 0.0935 Epoch: 0 Global Step: 8170 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:28:55,010-Speed 3334.21 samples/sec Loss 14.7703 LearningRate 0.0935 Epoch: 0 Global Step: 8180 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:28:58,040-Speed 3380.76 samples/sec Loss 14.9670 LearningRate 0.0935 Epoch: 0 Global Step: 8190 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:01,064-Speed 3386.67 samples/sec Loss 15.0207 LearningRate 0.0935 Epoch: 0 Global Step: 8200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:29:04,129-Speed 3342.57 samples/sec Loss 14.9389 LearningRate 0.0935 Epoch: 0 Global Step: 8210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:29:07,137-Speed 3404.92 samples/sec Loss 14.9171 LearningRate 0.0935 Epoch: 0 Global Step: 8220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:29:10,135-Speed 3416.83 samples/sec Loss 14.9299 LearningRate 0.0935 Epoch: 0 Global Step: 8230 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:13,223-Speed 3316.44 samples/sec Loss 14.7754 LearningRate 0.0935 Epoch: 0 Global Step: 8240 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:16,251-Speed 3383.20 samples/sec Loss 14.8773 LearningRate 0.0935 Epoch: 0 Global Step: 8250 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:19,295-Speed 3365.04 samples/sec Loss 14.8752 LearningRate 0.0935 Epoch: 0 Global Step: 8260 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:22,307-Speed 3401.41 samples/sec Loss 14.9895 LearningRate 0.0935 Epoch: 0 Global Step: 8270 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:25,331-Speed 3386.66 samples/sec Loss 14.7227 LearningRate 0.0934 Epoch: 0 Global Step: 8280 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:28,412-Speed 3324.88 samples/sec Loss 14.7693 LearningRate 0.0934 Epoch: 0 Global Step: 8290 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:31,482-Speed 3335.76 samples/sec Loss 14.9313 LearningRate 0.0934 Epoch: 0 Global Step: 8300 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:34,493-Speed 3402.80 samples/sec Loss 14.8712 LearningRate 0.0934 Epoch: 0 Global Step: 8310 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:37,566-Speed 3333.12 samples/sec Loss 14.7249 LearningRate 0.0934 Epoch: 0 Global Step: 8320 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:40,629-Speed 3343.90 samples/sec Loss 14.5809 LearningRate 0.0934 Epoch: 0 Global Step: 8330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:29:43,678-Speed 3359.54 samples/sec Loss 14.7709 LearningRate 0.0934 Epoch: 0 Global Step: 8340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:29:46,746-Speed 3338.73 samples/sec Loss 14.8713 LearningRate 0.0934 Epoch: 0 Global Step: 8350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:29:49,765-Speed 3393.12 samples/sec Loss 14.7478 LearningRate 0.0934 Epoch: 0 Global Step: 8360 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:52,812-Speed 3361.46 samples/sec Loss 14.7736 LearningRate 0.0934 Epoch: 0 Global Step: 8370 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:55,874-Speed 3345.14 samples/sec Loss 14.8030 LearningRate 0.0934 Epoch: 0 Global Step: 8380 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:29:58,951-Speed 3328.95 samples/sec Loss 14.8977 LearningRate 0.0934 Epoch: 0 Global Step: 8390 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:30:02,007-Speed 3351.74 samples/sec Loss 14.8011 LearningRate 0.0934 Epoch: 0 Global Step: 8400 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:30:05,042-Speed 3374.55 samples/sec Loss 14.8664 LearningRate 0.0933 Epoch: 0 Global Step: 8410 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:30:08,088-Speed 3363.30 samples/sec Loss 14.8660 LearningRate 0.0933 Epoch: 0 Global Step: 8420 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:30:11,123-Speed 3374.65 samples/sec Loss 14.6399 LearningRate 0.0933 Epoch: 0 Global Step: 8430 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:30:14,210-Speed 3318.70 samples/sec Loss 14.6589 LearningRate 0.0933 Epoch: 0 Global Step: 8440 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:30:17,236-Speed 3384.69 samples/sec Loss 14.6704 LearningRate 0.0933 Epoch: 0 Global Step: 8450 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:30:20,302-Speed 3341.28 samples/sec Loss 14.5814 LearningRate 0.0933 Epoch: 0 Global Step: 8460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:30:23,358-Speed 3352.43 samples/sec Loss 14.5642 LearningRate 0.0933 Epoch: 0 Global Step: 8470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:30:26,418-Speed 3347.57 samples/sec Loss 14.6469 LearningRate 0.0933 Epoch: 0 Global Step: 8480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:30:29,443-Speed 3385.77 samples/sec Loss 14.4890 LearningRate 0.0933 Epoch: 0 Global Step: 8490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:30:32,440-Speed 3417.96 samples/sec Loss 14.6400 LearningRate 0.0933 Epoch: 0 Global Step: 8500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:30:35,489-Speed 3360.21 samples/sec Loss 14.7055 LearningRate 0.0933 Epoch: 0 Global Step: 8510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:30:38,571-Speed 3322.96 samples/sec Loss 14.4513 LearningRate 0.0933 Epoch: 0 Global Step: 8520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:30:41,622-Speed 3357.23 samples/sec Loss 14.7147 LearningRate 0.0933 Epoch: 0 Global Step: 8530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:30:44,627-Speed 3409.10 samples/sec Loss 14.7262 LearningRate 0.0932 Epoch: 0 Global Step: 8540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:30:47,680-Speed 3355.20 samples/sec Loss 14.4913 LearningRate 0.0932 Epoch: 0 Global Step: 8550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:30:50,699-Speed 3392.95 samples/sec Loss 14.7242 LearningRate 0.0932 Epoch: 0 Global Step: 8560 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:30:53,761-Speed 3344.52 samples/sec Loss 14.4925 LearningRate 0.0932 Epoch: 0 Global Step: 8570 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:30:56,797-Speed 3374.29 samples/sec Loss 14.4723 LearningRate 0.0932 Epoch: 0 Global Step: 8580 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:30:59,850-Speed 3355.00 samples/sec Loss 14.6302 LearningRate 0.0932 Epoch: 0 Global Step: 8590 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:31:02,880-Speed 3381.45 samples/sec Loss 14.4860 LearningRate 0.0932 Epoch: 0 Global Step: 8600 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:31:05,898-Speed 3393.84 samples/sec Loss 14.4917 LearningRate 0.0932 Epoch: 0 Global Step: 8610 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:31:08,905-Speed 3406.43 samples/sec Loss 14.3838 LearningRate 0.0932 Epoch: 0 Global Step: 8620 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:31:11,960-Speed 3353.06 samples/sec Loss 14.2668 LearningRate 0.0932 Epoch: 0 Global Step: 8630 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:31:14,992-Speed 3378.10 samples/sec Loss 14.4913 LearningRate 0.0932 Epoch: 0 Global Step: 8640 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:31:18,032-Speed 3368.95 samples/sec Loss 14.3601 LearningRate 0.0932 Epoch: 0 Global Step: 8650 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 02:31:21,047-Speed 3398.17 samples/sec Loss 14.4875 LearningRate 0.0931 Epoch: 0 Global Step: 8660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:31:24,086-Speed 3370.65 samples/sec Loss 14.3063 LearningRate 0.0931 Epoch: 0 Global Step: 8670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:31:27,120-Speed 3376.27 samples/sec Loss 14.3900 LearningRate 0.0931 Epoch: 0 Global Step: 8680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:31:30,199-Speed 3326.07 samples/sec Loss 14.4752 LearningRate 0.0931 Epoch: 0 Global Step: 8690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:31:33,252-Speed 3355.51 samples/sec Loss 14.4012 LearningRate 0.0931 Epoch: 0 Global Step: 8700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:31:36,356-Speed 3300.39 samples/sec Loss 14.2709 LearningRate 0.0931 Epoch: 0 Global Step: 8710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:31:39,449-Speed 3311.01 samples/sec Loss 14.2855 LearningRate 0.0931 Epoch: 0 Global Step: 8720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:31:42,487-Speed 3372.10 samples/sec Loss 14.3039 LearningRate 0.0931 Epoch: 0 Global Step: 8730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:31:45,499-Speed 3401.16 samples/sec Loss 14.4217 LearningRate 0.0931 Epoch: 0 Global Step: 8740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 02:31:48,505-Speed 3407.60 samples/sec Loss 14.3831 LearningRate 0.0931 Epoch: 0 Global Step: 8750 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:31:51,606-Speed 3302.83 samples/sec Loss 14.3387 LearningRate 0.0931 Epoch: 0 Global Step: 8760 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:31:54,660-Speed 3354.35 samples/sec Loss 14.2794 LearningRate 0.0931 Epoch: 0 Global Step: 8770 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:31:57,696-Speed 3373.24 samples/sec Loss 14.1944 LearningRate 0.0931 Epoch: 0 Global Step: 8780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:00,713-Speed 3395.98 samples/sec Loss 14.2583 LearningRate 0.0930 Epoch: 0 Global Step: 8790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:03,786-Speed 3332.79 samples/sec Loss 14.2498 LearningRate 0.0930 Epoch: 0 Global Step: 8800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:06,847-Speed 3347.07 samples/sec Loss 14.2480 LearningRate 0.0930 Epoch: 0 Global Step: 8810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:09,856-Speed 3403.88 samples/sec Loss 14.3114 LearningRate 0.0930 Epoch: 0 Global Step: 8820 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:12,924-Speed 3338.34 samples/sec Loss 14.3654 LearningRate 0.0930 Epoch: 0 Global Step: 8830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:15,957-Speed 3377.52 samples/sec Loss 14.2178 LearningRate 0.0930 Epoch: 0 Global Step: 8840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:18,981-Speed 3386.80 samples/sec Loss 14.2984 LearningRate 0.0930 Epoch: 0 Global Step: 8850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:32:22,072-Speed 3314.52 samples/sec Loss 14.2728 LearningRate 0.0930 Epoch: 0 Global Step: 8860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:32:25,122-Speed 3357.52 samples/sec Loss 14.2254 LearningRate 0.0930 Epoch: 0 Global Step: 8870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:28,224-Speed 3303.06 samples/sec Loss 14.0192 LearningRate 0.0930 Epoch: 0 Global Step: 8880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:31,253-Speed 3381.96 samples/sec Loss 14.0697 LearningRate 0.0930 Epoch: 0 Global Step: 8890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:34,271-Speed 3393.13 samples/sec Loss 14.2659 LearningRate 0.0930 Epoch: 0 Global Step: 8900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:37,300-Speed 3381.87 samples/sec Loss 14.0966 LearningRate 0.0930 Epoch: 0 Global Step: 8910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:40,299-Speed 3416.43 samples/sec Loss 14.2033 LearningRate 0.0929 Epoch: 0 Global Step: 8920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:43,321-Speed 3389.12 samples/sec Loss 14.0477 LearningRate 0.0929 Epoch: 0 Global Step: 8930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:46,357-Speed 3374.77 samples/sec Loss 14.1236 LearningRate 0.0929 Epoch: 0 Global Step: 8940 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:49,379-Speed 3389.02 samples/sec Loss 13.9982 LearningRate 0.0929 Epoch: 0 Global Step: 8950 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:52,480-Speed 3303.64 samples/sec Loss 13.9340 LearningRate 0.0929 Epoch: 0 Global Step: 8960 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:32:55,489-Speed 3403.36 samples/sec Loss 13.9264 LearningRate 0.0929 Epoch: 0 Global Step: 8970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:32:58,488-Speed 3416.41 samples/sec Loss 14.0852 LearningRate 0.0929 Epoch: 0 Global Step: 8980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:33:01,567-Speed 3326.11 samples/sec Loss 14.2248 LearningRate 0.0929 Epoch: 0 Global Step: 8990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:33:04,606-Speed 3370.73 samples/sec Loss 14.1221 LearningRate 0.0929 Epoch: 0 Global Step: 9000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:33:07,611-Speed 3409.50 samples/sec Loss 14.1601 LearningRate 0.0929 Epoch: 0 Global Step: 9010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:33:10,630-Speed 3392.35 samples/sec Loss 14.0928 LearningRate 0.0929 Epoch: 0 Global Step: 9020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:33:13,732-Speed 3301.92 samples/sec Loss 13.9676 LearningRate 0.0929 Epoch: 0 Global Step: 9030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:33:16,745-Speed 3399.84 samples/sec Loss 14.0106 LearningRate 0.0929 Epoch: 0 Global Step: 9040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:33:19,759-Speed 3398.39 samples/sec Loss 13.9173 LearningRate 0.0928 Epoch: 0 Global Step: 9050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:33:22,782-Speed 3388.89 samples/sec Loss 13.8040 LearningRate 0.0928 Epoch: 0 Global Step: 9060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:33:25,851-Speed 3338.52 samples/sec Loss 13.8510 LearningRate 0.0928 Epoch: 0 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 02:33:28,891-Speed 3369.29 samples/sec Loss 14.0106 LearningRate 0.0928 Epoch: 0 Global Step: 9080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:33:31,925-Speed 3376.00 samples/sec Loss 13.9923 LearningRate 0.0928 Epoch: 0 Global Step: 9090 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:33:34,957-Speed 3379.32 samples/sec Loss 13.9112 LearningRate 0.0928 Epoch: 0 Global Step: 9100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:33:37,985-Speed 3382.99 samples/sec Loss 14.0273 LearningRate 0.0928 Epoch: 0 Global Step: 9110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:33:41,005-Speed 3391.48 samples/sec Loss 13.9166 LearningRate 0.0928 Epoch: 0 Global Step: 9120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:33:44,047-Speed 3366.87 samples/sec Loss 14.0047 LearningRate 0.0928 Epoch: 0 Global Step: 9130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:33:47,143-Speed 3309.12 samples/sec Loss 13.8359 LearningRate 0.0928 Epoch: 0 Global Step: 9140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:33:50,155-Speed 3401.21 samples/sec Loss 13.8300 LearningRate 0.0928 Epoch: 0 Global Step: 9150 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:33:53,169-Speed 3398.42 samples/sec Loss 13.8138 LearningRate 0.0928 Epoch: 0 Global Step: 9160 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:33:56,184-Speed 3396.95 samples/sec Loss 13.9034 LearningRate 0.0928 Epoch: 0 Global Step: 9170 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:33:59,249-Speed 3341.87 samples/sec Loss 13.7498 LearningRate 0.0927 Epoch: 0 Global Step: 9180 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:34:02,400-Speed 3250.96 samples/sec Loss 13.8470 LearningRate 0.0927 Epoch: 0 Global Step: 9190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:34:05,471-Speed 3336.23 samples/sec Loss 13.8841 LearningRate 0.0927 Epoch: 0 Global Step: 9200 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:34:08,510-Speed 3370.67 samples/sec Loss 13.7688 LearningRate 0.0927 Epoch: 0 Global Step: 9210 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:34:11,585-Speed 3331.08 samples/sec Loss 13.7472 LearningRate 0.0927 Epoch: 0 Global Step: 9220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:34:14,705-Speed 3282.61 samples/sec Loss 13.9161 LearningRate 0.0927 Epoch: 0 Global Step: 9230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:34:17,746-Speed 3369.06 samples/sec Loss 13.7921 LearningRate 0.0927 Epoch: 0 Global Step: 9240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:34:20,768-Speed 3388.70 samples/sec Loss 13.8402 LearningRate 0.0927 Epoch: 0 Global Step: 9250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:34:23,808-Speed 3369.83 samples/sec Loss 13.6683 LearningRate 0.0927 Epoch: 0 Global Step: 9260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:34:26,842-Speed 3376.73 samples/sec Loss 13.7430 LearningRate 0.0927 Epoch: 0 Global Step: 9270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:34:29,886-Speed 3364.87 samples/sec Loss 13.6543 LearningRate 0.0927 Epoch: 0 Global Step: 9280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:34:32,878-Speed 3423.91 samples/sec Loss 13.8918 LearningRate 0.0927 Epoch: 0 Global Step: 9290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:34:35,931-Speed 3355.60 samples/sec Loss 13.6885 LearningRate 0.0927 Epoch: 0 Global Step: 9300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:34:38,963-Speed 3377.73 samples/sec Loss 13.7264 LearningRate 0.0926 Epoch: 0 Global Step: 9310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:34:42,011-Speed 3360.03 samples/sec Loss 13.6256 LearningRate 0.0926 Epoch: 0 Global Step: 9320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:34:45,059-Speed 3361.42 samples/sec Loss 13.6380 LearningRate 0.0926 Epoch: 0 Global Step: 9330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:34:48,080-Speed 3390.89 samples/sec Loss 13.6878 LearningRate 0.0926 Epoch: 0 Global Step: 9340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:34:51,104-Speed 3386.92 samples/sec Loss 13.7226 LearningRate 0.0926 Epoch: 0 Global Step: 9350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:34:54,107-Speed 3411.46 samples/sec Loss 13.5638 LearningRate 0.0926 Epoch: 0 Global Step: 9360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:34:57,103-Speed 3418.39 samples/sec Loss 13.7973 LearningRate 0.0926 Epoch: 0 Global Step: 9370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:35:00,141-Speed 3372.09 samples/sec Loss 13.7290 LearningRate 0.0926 Epoch: 0 Global Step: 9380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:35:03,139-Speed 3416.75 samples/sec Loss 13.6284 LearningRate 0.0926 Epoch: 0 Global Step: 9390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:06,168-Speed 3382.08 samples/sec Loss 13.6535 LearningRate 0.0926 Epoch: 0 Global Step: 9400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:09,177-Speed 3404.37 samples/sec Loss 13.7013 LearningRate 0.0926 Epoch: 0 Global Step: 9410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:12,226-Speed 3358.99 samples/sec Loss 13.6549 LearningRate 0.0926 Epoch: 0 Global Step: 9420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:15,257-Speed 3379.75 samples/sec Loss 13.6905 LearningRate 0.0926 Epoch: 0 Global Step: 9430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:18,254-Speed 3417.96 samples/sec Loss 13.5795 LearningRate 0.0925 Epoch: 0 Global Step: 9440 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:21,241-Speed 3428.55 samples/sec Loss 13.6554 LearningRate 0.0925 Epoch: 0 Global Step: 9450 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:24,322-Speed 3324.76 samples/sec Loss 13.6308 LearningRate 0.0925 Epoch: 0 Global Step: 9460 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:27,358-Speed 3374.54 samples/sec Loss 13.7458 LearningRate 0.0925 Epoch: 0 Global Step: 9470 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:30,401-Speed 3366.74 samples/sec Loss 13.6780 LearningRate 0.0925 Epoch: 0 Global Step: 9480 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:33,422-Speed 3389.85 samples/sec Loss 13.4175 LearningRate 0.0925 Epoch: 0 Global Step: 9490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:35:36,466-Speed 3365.62 samples/sec Loss 13.4381 LearningRate 0.0925 Epoch: 0 Global Step: 9500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:35:39,494-Speed 3383.26 samples/sec Loss 13.5684 LearningRate 0.0925 Epoch: 0 Global Step: 9510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:35:42,494-Speed 3414.88 samples/sec Loss 13.4829 LearningRate 0.0925 Epoch: 0 Global Step: 9520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:35:45,502-Speed 3404.52 samples/sec Loss 13.4927 LearningRate 0.0925 Epoch: 0 Global Step: 9530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:48,536-Speed 3376.48 samples/sec Loss 13.5491 LearningRate 0.0925 Epoch: 0 Global Step: 9540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:51,610-Speed 3332.01 samples/sec Loss 13.4524 LearningRate 0.0925 Epoch: 0 Global Step: 9550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:54,627-Speed 3396.17 samples/sec Loss 13.6139 LearningRate 0.0925 Epoch: 0 Global Step: 9560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:35:57,627-Speed 3414.20 samples/sec Loss 13.2932 LearningRate 0.0924 Epoch: 0 Global Step: 9570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:36:00,644-Speed 3396.13 samples/sec Loss 13.6552 LearningRate 0.0924 Epoch: 0 Global Step: 9580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:36:03,695-Speed 3356.35 samples/sec Loss 13.5357 LearningRate 0.0924 Epoch: 0 Global Step: 9590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:36:06,751-Speed 3352.27 samples/sec Loss 13.6083 LearningRate 0.0924 Epoch: 0 Global Step: 9600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:36:09,763-Speed 3400.76 samples/sec Loss 13.5619 LearningRate 0.0924 Epoch: 0 Global Step: 9610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:36:12,780-Speed 3395.71 samples/sec Loss 13.5152 LearningRate 0.0924 Epoch: 0 Global Step: 9620 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:36:15,791-Speed 3401.90 samples/sec Loss 13.4892 LearningRate 0.0924 Epoch: 0 Global Step: 9630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:36:18,846-Speed 3352.95 samples/sec Loss 13.4938 LearningRate 0.0924 Epoch: 0 Global Step: 9640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:36:21,856-Speed 3401.84 samples/sec Loss 13.4456 LearningRate 0.0924 Epoch: 0 Global Step: 9650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:36:24,946-Speed 3315.52 samples/sec Loss 13.3534 LearningRate 0.0924 Epoch: 0 Global Step: 9660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:36:28,042-Speed 3308.09 samples/sec Loss 13.2739 LearningRate 0.0924 Epoch: 0 Global Step: 9670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:36:31,086-Speed 3365.35 samples/sec Loss 13.3646 LearningRate 0.0924 Epoch: 0 Global Step: 9680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:36:34,107-Speed 3390.92 samples/sec Loss 13.3759 LearningRate 0.0924 Epoch: 0 Global Step: 9690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:36:37,141-Speed 3376.31 samples/sec Loss 13.3094 LearningRate 0.0923 Epoch: 0 Global Step: 9700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:36:40,139-Speed 3417.05 samples/sec Loss 13.4222 LearningRate 0.0923 Epoch: 0 Global Step: 9710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:36:43,164-Speed 3386.10 samples/sec Loss 13.3745 LearningRate 0.0923 Epoch: 0 Global Step: 9720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:36:46,183-Speed 3392.58 samples/sec Loss 13.3955 LearningRate 0.0923 Epoch: 0 Global Step: 9730 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:36:49,185-Speed 3412.19 samples/sec Loss 13.4392 LearningRate 0.0923 Epoch: 0 Global Step: 9740 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:36:52,246-Speed 3347.47 samples/sec Loss 13.3140 LearningRate 0.0923 Epoch: 0 Global Step: 9750 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:36:55,281-Speed 3374.43 samples/sec Loss 13.4611 LearningRate 0.0923 Epoch: 0 Global Step: 9760 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:36:58,342-Speed 3346.51 samples/sec Loss 13.3232 LearningRate 0.0923 Epoch: 0 Global Step: 9770 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:01,401-Speed 3348.88 samples/sec Loss 13.3496 LearningRate 0.0923 Epoch: 0 Global Step: 9780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:04,460-Speed 3348.68 samples/sec Loss 13.2745 LearningRate 0.0923 Epoch: 0 Global Step: 9790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:07,524-Speed 3342.35 samples/sec Loss 13.3589 LearningRate 0.0923 Epoch: 0 Global Step: 9800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:10,590-Speed 3340.97 samples/sec Loss 13.3502 LearningRate 0.0923 Epoch: 0 Global Step: 9810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:13,646-Speed 3352.53 samples/sec Loss 13.3608 LearningRate 0.0923 Epoch: 0 Global Step: 9820 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:16,676-Speed 3380.42 samples/sec Loss 13.4085 LearningRate 0.0922 Epoch: 0 Global Step: 9830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:19,694-Speed 3393.22 samples/sec Loss 13.4087 LearningRate 0.0922 Epoch: 0 Global Step: 9840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:22,730-Speed 3374.63 samples/sec Loss 13.2685 LearningRate 0.0922 Epoch: 0 Global Step: 9850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:25,810-Speed 3325.83 samples/sec Loss 13.3202 LearningRate 0.0922 Epoch: 0 Global Step: 9860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:28,908-Speed 3305.77 samples/sec Loss 13.2487 LearningRate 0.0922 Epoch: 0 Global Step: 9870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:31,983-Speed 3331.80 samples/sec Loss 13.1921 LearningRate 0.0922 Epoch: 0 Global Step: 9880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:35,004-Speed 3390.62 samples/sec Loss 13.1378 LearningRate 0.0922 Epoch: 0 Global Step: 9890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:38,071-Speed 3339.52 samples/sec Loss 13.2293 LearningRate 0.0922 Epoch: 0 Global Step: 9900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:41,254-Speed 3218.43 samples/sec Loss 13.2845 LearningRate 0.0922 Epoch: 0 Global Step: 9910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:44,292-Speed 3371.73 samples/sec Loss 13.0277 LearningRate 0.0922 Epoch: 0 Global Step: 9920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:37:47,322-Speed 3380.41 samples/sec Loss 13.3413 LearningRate 0.0922 Epoch: 0 Global Step: 9930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:37:50,341-Speed 3393.41 samples/sec Loss 13.1154 LearningRate 0.0922 Epoch: 0 Global Step: 9940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:37:53,389-Speed 3360.33 samples/sec Loss 13.3166 LearningRate 0.0921 Epoch: 0 Global Step: 9950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:37:56,467-Speed 3327.81 samples/sec Loss 13.2109 LearningRate 0.0921 Epoch: 0 Global Step: 9960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:37:59,511-Speed 3365.16 samples/sec Loss 13.3490 LearningRate 0.0921 Epoch: 0 Global Step: 9970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:38:02,614-Speed 3300.79 samples/sec Loss 13.1495 LearningRate 0.0921 Epoch: 0 Global Step: 9980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:05,706-Speed 3313.51 samples/sec Loss 12.9924 LearningRate 0.0921 Epoch: 0 Global Step: 9990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:08,732-Speed 3384.44 samples/sec Loss 13.2544 LearningRate 0.0921 Epoch: 0 Global Step: 10000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:11,744-Speed 3401.25 samples/sec Loss 13.0429 LearningRate 0.0921 Epoch: 0 Global Step: 10010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:14,815-Speed 3335.32 samples/sec Loss 13.1700 LearningRate 0.0921 Epoch: 0 Global Step: 10020 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:17,883-Speed 3338.28 samples/sec Loss 13.1067 LearningRate 0.0921 Epoch: 0 Global Step: 10030 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:20,923-Speed 3369.49 samples/sec Loss 12.9932 LearningRate 0.0921 Epoch: 0 Global Step: 10040 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:23,954-Speed 3380.39 samples/sec Loss 13.1945 LearningRate 0.0921 Epoch: 0 Global Step: 10050 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:27,040-Speed 3318.41 samples/sec Loss 13.1891 LearningRate 0.0921 Epoch: 0 Global Step: 10060 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:30,136-Speed 3308.74 samples/sec Loss 13.0342 LearningRate 0.0921 Epoch: 0 Global Step: 10070 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:33,163-Speed 3384.00 samples/sec Loss 13.1553 LearningRate 0.0920 Epoch: 0 Global Step: 10080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:38:36,202-Speed 3370.91 samples/sec Loss 13.0028 LearningRate 0.0920 Epoch: 0 Global Step: 10090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:38:39,231-Speed 3381.34 samples/sec Loss 13.2067 LearningRate 0.0920 Epoch: 0 Global Step: 10100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:42,316-Speed 3320.59 samples/sec Loss 13.3023 LearningRate 0.0920 Epoch: 0 Global Step: 10110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:45,349-Speed 3376.41 samples/sec Loss 12.9746 LearningRate 0.0920 Epoch: 0 Global Step: 10120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:48,361-Speed 3401.15 samples/sec Loss 12.9997 LearningRate 0.0920 Epoch: 0 Global Step: 10130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:51,403-Speed 3367.38 samples/sec Loss 12.8941 LearningRate 0.0920 Epoch: 0 Global Step: 10140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:54,407-Speed 3409.84 samples/sec Loss 12.9674 LearningRate 0.0920 Epoch: 0 Global Step: 10150 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:38:57,415-Speed 3405.56 samples/sec Loss 13.0863 LearningRate 0.0920 Epoch: 0 Global Step: 10160 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:00,451-Speed 3373.54 samples/sec Loss 12.9799 LearningRate 0.0920 Epoch: 0 Global Step: 10170 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:03,521-Speed 3337.09 samples/sec Loss 13.0145 LearningRate 0.0920 Epoch: 0 Global Step: 10180 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:06,658-Speed 3265.02 samples/sec Loss 13.0330 LearningRate 0.0920 Epoch: 0 Global Step: 10190 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:09,726-Speed 3338.76 samples/sec Loss 13.0461 LearningRate 0.0920 Epoch: 0 Global Step: 10200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:39:12,759-Speed 3376.55 samples/sec Loss 13.1437 LearningRate 0.0919 Epoch: 0 Global Step: 10210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:39:15,798-Speed 3371.05 samples/sec Loss 13.0921 LearningRate 0.0919 Epoch: 0 Global Step: 10220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:39:18,913-Speed 3288.67 samples/sec Loss 12.9892 LearningRate 0.0919 Epoch: 0 Global Step: 10230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:21,927-Speed 3398.16 samples/sec Loss 13.0811 LearningRate 0.0919 Epoch: 0 Global Step: 10240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:24,972-Speed 3364.29 samples/sec Loss 12.9249 LearningRate 0.0919 Epoch: 0 Global Step: 10250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:28,029-Speed 3350.33 samples/sec Loss 12.9698 LearningRate 0.0919 Epoch: 0 Global Step: 10260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:31,083-Speed 3354.62 samples/sec Loss 12.8211 LearningRate 0.0919 Epoch: 0 Global Step: 10270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:34,102-Speed 3392.87 samples/sec Loss 12.8753 LearningRate 0.0919 Epoch: 0 Global Step: 10280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:37,181-Speed 3326.46 samples/sec Loss 12.8056 LearningRate 0.0919 Epoch: 0 Global Step: 10290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:40,202-Speed 3390.86 samples/sec Loss 12.9527 LearningRate 0.0919 Epoch: 0 Global Step: 10300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:43,224-Speed 3390.10 samples/sec Loss 12.7326 LearningRate 0.0919 Epoch: 0 Global Step: 10310 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:46,284-Speed 3347.09 samples/sec Loss 12.8792 LearningRate 0.0919 Epoch: 0 Global Step: 10320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:49,377-Speed 3312.36 samples/sec Loss 12.9665 LearningRate 0.0919 Epoch: 0 Global Step: 10330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:39:52,378-Speed 3412.65 samples/sec Loss 12.9085 LearningRate 0.0918 Epoch: 0 Global Step: 10340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:39:55,439-Speed 3346.20 samples/sec Loss 12.8804 LearningRate 0.0918 Epoch: 0 Global Step: 10350 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:39:58,485-Speed 3363.37 samples/sec Loss 12.8597 LearningRate 0.0918 Epoch: 0 Global Step: 10360 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:40:01,507-Speed 3389.06 samples/sec Loss 12.7877 LearningRate 0.0918 Epoch: 0 Global Step: 10370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:40:04,520-Speed 3400.16 samples/sec Loss 12.7994 LearningRate 0.0918 Epoch: 0 Global Step: 10380 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:40:07,515-Speed 3420.37 samples/sec Loss 12.8250 LearningRate 0.0918 Epoch: 0 Global Step: 10390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:40:10,530-Speed 3397.87 samples/sec Loss 12.8148 LearningRate 0.0918 Epoch: 0 Global Step: 10400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:40:13,592-Speed 3345.31 samples/sec Loss 12.7639 LearningRate 0.0918 Epoch: 0 Global Step: 10410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:40:16,610-Speed 3393.98 samples/sec Loss 12.9037 LearningRate 0.0918 Epoch: 0 Global Step: 10420 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:40:19,626-Speed 3395.63 samples/sec Loss 12.7277 LearningRate 0.0918 Epoch: 0 Global Step: 10430 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:40:22,654-Speed 3383.34 samples/sec Loss 12.8476 LearningRate 0.0918 Epoch: 0 Global Step: 10440 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:40:25,673-Speed 3392.95 samples/sec Loss 12.8395 LearningRate 0.0918 Epoch: 0 Global Step: 10450 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:40:28,741-Speed 3338.89 samples/sec Loss 12.6736 LearningRate 0.0918 Epoch: 0 Global Step: 10460 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:40:31,789-Speed 3360.55 samples/sec Loss 12.8135 LearningRate 0.0917 Epoch: 0 Global Step: 10470 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:40:34,829-Speed 3369.42 samples/sec Loss 12.9076 LearningRate 0.0917 Epoch: 0 Global Step: 10480 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:40:37,867-Speed 3371.77 samples/sec Loss 12.8452 LearningRate 0.0917 Epoch: 0 Global Step: 10490 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:40:40,872-Speed 3408.70 samples/sec Loss 12.7964 LearningRate 0.0917 Epoch: 0 Global Step: 10500 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:40:43,916-Speed 3365.38 samples/sec Loss 12.8333 LearningRate 0.0917 Epoch: 0 Global Step: 10510 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:40:46,970-Speed 3353.76 samples/sec Loss 12.6897 LearningRate 0.0917 Epoch: 0 Global Step: 10520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:40:50,033-Speed 3344.59 samples/sec Loss 12.6303 LearningRate 0.0917 Epoch: 0 Global Step: 10530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:40:53,095-Speed 3344.82 samples/sec Loss 12.9099 LearningRate 0.0917 Epoch: 0 Global Step: 10540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:40:56,126-Speed 3379.73 samples/sec Loss 12.6825 LearningRate 0.0917 Epoch: 0 Global Step: 10550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:40:59,120-Speed 3421.55 samples/sec Loss 12.8472 LearningRate 0.0917 Epoch: 0 Global Step: 10560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:02,136-Speed 3396.58 samples/sec Loss 12.8634 LearningRate 0.0917 Epoch: 0 Global Step: 10570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:05,261-Speed 3278.21 samples/sec Loss 12.8268 LearningRate 0.0917 Epoch: 0 Global Step: 10580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:08,322-Speed 3346.42 samples/sec Loss 12.7065 LearningRate 0.0917 Epoch: 0 Global Step: 10590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:11,355-Speed 3377.06 samples/sec Loss 12.6862 LearningRate 0.0916 Epoch: 0 Global Step: 10600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:14,416-Speed 3345.78 samples/sec Loss 12.6418 LearningRate 0.0916 Epoch: 0 Global Step: 10610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:17,460-Speed 3365.26 samples/sec Loss 12.6152 LearningRate 0.0916 Epoch: 0 Global Step: 10620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:41:20,472-Speed 3401.35 samples/sec Loss 12.7147 LearningRate 0.0916 Epoch: 0 Global Step: 10630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:41:23,501-Speed 3381.18 samples/sec Loss 12.6051 LearningRate 0.0916 Epoch: 0 Global Step: 10640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:41:26,562-Speed 3346.30 samples/sec Loss 12.8186 LearningRate 0.0916 Epoch: 0 Global Step: 10650 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:29,603-Speed 3369.06 samples/sec Loss 12.7358 LearningRate 0.0916 Epoch: 0 Global Step: 10660 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:32,614-Speed 3401.65 samples/sec Loss 12.5729 LearningRate 0.0916 Epoch: 0 Global Step: 10670 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:35,713-Speed 3305.91 samples/sec Loss 12.6684 LearningRate 0.0916 Epoch: 0 Global Step: 10680 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:38,811-Speed 3306.11 samples/sec Loss 12.7678 LearningRate 0.0916 Epoch: 0 Global Step: 10690 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:41,880-Speed 3338.75 samples/sec Loss 12.6509 LearningRate 0.0916 Epoch: 0 Global Step: 10700 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:44,924-Speed 3364.44 samples/sec Loss 12.8302 LearningRate 0.0916 Epoch: 0 Global Step: 10710 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:47,987-Speed 3344.25 samples/sec Loss 12.6608 LearningRate 0.0916 Epoch: 0 Global Step: 10720 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:51,096-Speed 3295.20 samples/sec Loss 12.7148 LearningRate 0.0915 Epoch: 0 Global Step: 10730 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:54,169-Speed 3333.30 samples/sec Loss 12.5626 LearningRate 0.0915 Epoch: 0 Global Step: 10740 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:41:57,206-Speed 3372.93 samples/sec Loss 12.5629 LearningRate 0.0915 Epoch: 0 Global Step: 10750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:42:00,218-Speed 3399.81 samples/sec Loss 12.4831 LearningRate 0.0915 Epoch: 0 Global Step: 10760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:42:03,308-Speed 3314.82 samples/sec Loss 12.4265 LearningRate 0.0915 Epoch: 0 Global Step: 10770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:42:06,406-Speed 3307.22 samples/sec Loss 12.7082 LearningRate 0.0915 Epoch: 0 Global Step: 10780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:42:09,418-Speed 3401.08 samples/sec Loss 12.7474 LearningRate 0.0915 Epoch: 0 Global Step: 10790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:42:12,416-Speed 3416.26 samples/sec Loss 12.5879 LearningRate 0.0915 Epoch: 0 Global Step: 10800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:42:15,466-Speed 3358.86 samples/sec Loss 12.4613 LearningRate 0.0915 Epoch: 0 Global Step: 10810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:42:18,521-Speed 3352.83 samples/sec Loss 12.6699 LearningRate 0.0915 Epoch: 0 Global Step: 10820 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:42:21,531-Speed 3402.55 samples/sec Loss 12.6413 LearningRate 0.0915 Epoch: 0 Global Step: 10830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:42:24,563-Speed 3378.86 samples/sec Loss 12.5891 LearningRate 0.0915 Epoch: 0 Global Step: 10840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:42:27,651-Speed 3317.52 samples/sec Loss 12.3438 LearningRate 0.0915 Epoch: 0 Global Step: 10850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:42:30,676-Speed 3385.82 samples/sec Loss 12.6449 LearningRate 0.0914 Epoch: 0 Global Step: 10860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:42:33,674-Speed 3417.16 samples/sec Loss 12.7317 LearningRate 0.0914 Epoch: 0 Global Step: 10870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:42:36,701-Speed 3383.58 samples/sec Loss 12.6292 LearningRate 0.0914 Epoch: 0 Global Step: 10880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:42:39,836-Speed 3267.67 samples/sec Loss 12.5438 LearningRate 0.0914 Epoch: 0 Global Step: 10890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:42:42,930-Speed 3310.90 samples/sec Loss 12.4986 LearningRate 0.0914 Epoch: 0 Global Step: 10900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:42:45,951-Speed 3390.62 samples/sec Loss 12.4127 LearningRate 0.0914 Epoch: 0 Global Step: 10910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:42:49,001-Speed 3358.22 samples/sec Loss 12.5488 LearningRate 0.0914 Epoch: 0 Global Step: 10920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:42:52,021-Speed 3392.07 samples/sec Loss 12.6137 LearningRate 0.0914 Epoch: 0 Global Step: 10930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:42:55,061-Speed 3370.30 samples/sec Loss 12.6558 LearningRate 0.0914 Epoch: 0 Global Step: 10940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:42:58,075-Speed 3398.05 samples/sec Loss 12.4251 LearningRate 0.0914 Epoch: 0 Global Step: 10950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:43:01,073-Speed 3416.84 samples/sec Loss 12.5444 LearningRate 0.0914 Epoch: 0 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:43:04,069-Speed 3419.90 samples/sec Loss 12.4616 LearningRate 0.0914 Epoch: 0 Global Step: 10970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:43:07,121-Speed 3356.35 samples/sec Loss 12.5369 LearningRate 0.0914 Epoch: 0 Global Step: 10980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:43:10,168-Speed 3362.23 samples/sec Loss 12.4454 LearningRate 0.0913 Epoch: 0 Global Step: 10990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:43:13,202-Speed 3375.45 samples/sec Loss 12.4451 LearningRate 0.0913 Epoch: 0 Global Step: 11000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:43:16,241-Speed 3370.96 samples/sec Loss 12.4587 LearningRate 0.0913 Epoch: 0 Global Step: 11010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 02:43:19,336-Speed 3309.25 samples/sec Loss 12.3741 LearningRate 0.0913 Epoch: 0 Global Step: 11020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 02:43:22,382-Speed 3362.80 samples/sec Loss 12.5373 LearningRate 0.0913 Epoch: 0 Global Step: 11030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 02:43:25,482-Speed 3305.28 samples/sec Loss 12.4917 LearningRate 0.0913 Epoch: 0 Global Step: 11040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:43:28,520-Speed 3371.41 samples/sec Loss 12.4217 LearningRate 0.0913 Epoch: 0 Global Step: 11050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:43:31,569-Speed 3359.73 samples/sec Loss 12.4356 LearningRate 0.0913 Epoch: 0 Global Step: 11060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:43:34,622-Speed 3355.58 samples/sec Loss 12.4785 LearningRate 0.0913 Epoch: 0 Global Step: 11070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:43:37,693-Speed 3334.51 samples/sec Loss 12.4360 LearningRate 0.0913 Epoch: 0 Global Step: 11080 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:43:40,777-Speed 3321.75 samples/sec Loss 12.5667 LearningRate 0.0913 Epoch: 0 Global Step: 11090 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:43:43,797-Speed 3391.60 samples/sec Loss 12.4319 LearningRate 0.0913 Epoch: 0 Global Step: 11100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:43:46,845-Speed 3361.26 samples/sec Loss 12.2174 LearningRate 0.0913 Epoch: 0 Global Step: 11110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:43:49,895-Speed 3358.40 samples/sec Loss 12.4540 LearningRate 0.0912 Epoch: 0 Global Step: 11120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:43:52,978-Speed 3322.54 samples/sec Loss 12.2866 LearningRate 0.0912 Epoch: 0 Global Step: 11130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:43:56,045-Speed 3339.96 samples/sec Loss 12.4140 LearningRate 0.0912 Epoch: 0 Global Step: 11140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:43:59,055-Speed 3403.06 samples/sec Loss 12.3055 LearningRate 0.0912 Epoch: 0 Global Step: 11150 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:02,117-Speed 3345.13 samples/sec Loss 12.3647 LearningRate 0.0912 Epoch: 0 Global Step: 11160 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:05,193-Speed 3330.00 samples/sec Loss 12.5069 LearningRate 0.0912 Epoch: 0 Global Step: 11170 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:08,276-Speed 3322.58 samples/sec Loss 12.4159 LearningRate 0.0912 Epoch: 0 Global Step: 11180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:44:11,321-Speed 3364.75 samples/sec Loss 12.3496 LearningRate 0.0912 Epoch: 0 Global Step: 11190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:44:14,346-Speed 3385.45 samples/sec Loss 12.4869 LearningRate 0.0912 Epoch: 0 Global Step: 11200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:44:17,422-Speed 3330.63 samples/sec Loss 12.3013 LearningRate 0.0912 Epoch: 0 Global Step: 11210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:44:20,500-Speed 3327.86 samples/sec Loss 12.2409 LearningRate 0.0912 Epoch: 0 Global Step: 11220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:44:23,519-Speed 3391.91 samples/sec Loss 12.2163 LearningRate 0.0912 Epoch: 0 Global Step: 11230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:26,605-Speed 3319.29 samples/sec Loss 12.2648 LearningRate 0.0912 Epoch: 0 Global Step: 11240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:29,748-Speed 3259.01 samples/sec Loss 12.3235 LearningRate 0.0911 Epoch: 0 Global Step: 11250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:32,787-Speed 3370.70 samples/sec Loss 12.4295 LearningRate 0.0911 Epoch: 0 Global Step: 11260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:35,875-Speed 3317.46 samples/sec Loss 12.3528 LearningRate 0.0911 Epoch: 0 Global Step: 11270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:38,940-Speed 3341.81 samples/sec Loss 12.3846 LearningRate 0.0911 Epoch: 0 Global Step: 11280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:41,987-Speed 3361.15 samples/sec Loss 12.2269 LearningRate 0.0911 Epoch: 0 Global Step: 11290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:45,004-Speed 3395.88 samples/sec Loss 12.2652 LearningRate 0.0911 Epoch: 0 Global Step: 11300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:48,005-Speed 3412.96 samples/sec Loss 12.3527 LearningRate 0.0911 Epoch: 0 Global Step: 11310 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:51,033-Speed 3383.76 samples/sec Loss 12.3049 LearningRate 0.0911 Epoch: 0 Global Step: 11320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:44:54,106-Speed 3333.02 samples/sec Loss 12.4169 LearningRate 0.0911 Epoch: 0 Global Step: 11330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:44:57,149-Speed 3366.11 samples/sec Loss 12.3095 LearningRate 0.0911 Epoch: 0 Global Step: 11340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:45:00,233-Speed 3320.86 samples/sec Loss 12.4101 LearningRate 0.0911 Epoch: 0 Global Step: 11350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:45:03,325-Speed 3312.75 samples/sec Loss 12.3854 LearningRate 0.0911 Epoch: 0 Global Step: 11360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:45:06,358-Speed 3376.82 samples/sec Loss 12.1836 LearningRate 0.0911 Epoch: 0 Global Step: 11370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:45:09,391-Speed 3378.43 samples/sec Loss 12.2166 LearningRate 0.0910 Epoch: 0 Global Step: 11380 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:45:12,451-Speed 3347.58 samples/sec Loss 12.2006 LearningRate 0.0910 Epoch: 0 Global Step: 11390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:45:15,511-Speed 3347.56 samples/sec Loss 12.1845 LearningRate 0.0910 Epoch: 0 Global Step: 11400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:45:18,514-Speed 3411.11 samples/sec Loss 12.1684 LearningRate 0.0910 Epoch: 0 Global Step: 11410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:45:21,535-Speed 3390.41 samples/sec Loss 12.3038 LearningRate 0.0910 Epoch: 0 Global Step: 11420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:45:24,626-Speed 3313.02 samples/sec Loss 12.3296 LearningRate 0.0910 Epoch: 0 Global Step: 11430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:45:27,709-Speed 3323.26 samples/sec Loss 12.1004 LearningRate 0.0910 Epoch: 0 Global Step: 11440 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:45:30,745-Speed 3374.10 samples/sec Loss 12.2602 LearningRate 0.0910 Epoch: 0 Global Step: 11450 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:45:33,788-Speed 3366.01 samples/sec Loss 12.1567 LearningRate 0.0910 Epoch: 0 Global Step: 11460 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:45:36,867-Speed 3326.31 samples/sec Loss 12.1310 LearningRate 0.0910 Epoch: 0 Global Step: 11470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:45:39,904-Speed 3372.68 samples/sec Loss 12.0516 LearningRate 0.0910 Epoch: 0 Global Step: 11480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:45:43,009-Speed 3299.21 samples/sec Loss 12.2255 LearningRate 0.0910 Epoch: 0 Global Step: 11490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:45:46,017-Speed 3405.61 samples/sec Loss 12.2035 LearningRate 0.0910 Epoch: 0 Global Step: 11500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:45:49,058-Speed 3368.50 samples/sec Loss 12.1446 LearningRate 0.0909 Epoch: 0 Global Step: 11510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:45:52,088-Speed 3380.93 samples/sec Loss 12.1153 LearningRate 0.0909 Epoch: 0 Global Step: 11520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:45:55,142-Speed 3353.83 samples/sec Loss 12.1489 LearningRate 0.0909 Epoch: 0 Global Step: 11530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:45:58,153-Speed 3401.88 samples/sec Loss 12.0952 LearningRate 0.0909 Epoch: 0 Global Step: 11540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:46:01,244-Speed 3313.29 samples/sec Loss 12.0489 LearningRate 0.0909 Epoch: 0 Global Step: 11550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:46:04,274-Speed 3381.19 samples/sec Loss 12.0744 LearningRate 0.0909 Epoch: 0 Global Step: 11560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:46:07,354-Speed 3326.33 samples/sec Loss 12.2023 LearningRate 0.0909 Epoch: 0 Global Step: 11570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 02:46:10,339-Speed 3430.80 samples/sec Loss 12.0662 LearningRate 0.0909 Epoch: 0 Global Step: 11580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:46:13,351-Speed 3401.11 samples/sec Loss 12.1832 LearningRate 0.0909 Epoch: 0 Global Step: 11590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:46:16,357-Speed 3408.05 samples/sec Loss 12.1418 LearningRate 0.0909 Epoch: 0 Global Step: 11600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:46:19,383-Speed 3384.40 samples/sec Loss 12.1346 LearningRate 0.0909 Epoch: 0 Global Step: 11610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:46:22,411-Speed 3382.97 samples/sec Loss 12.0716 LearningRate 0.0909 Epoch: 0 Global Step: 11620 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:46:25,459-Speed 3360.92 samples/sec Loss 12.0258 LearningRate 0.0909 Epoch: 0 Global Step: 11630 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:46:28,502-Speed 3366.11 samples/sec Loss 12.3459 LearningRate 0.0908 Epoch: 0 Global Step: 11640 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:46:31,508-Speed 3408.08 samples/sec Loss 12.0037 LearningRate 0.0908 Epoch: 0 Global Step: 11650 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:46:34,516-Speed 3404.67 samples/sec Loss 12.1782 LearningRate 0.0908 Epoch: 0 Global Step: 11660 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:46:37,563-Speed 3362.80 samples/sec Loss 12.2566 LearningRate 0.0908 Epoch: 0 Global Step: 11670 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:46:40,592-Speed 3381.45 samples/sec Loss 12.0585 LearningRate 0.0908 Epoch: 0 Global Step: 11680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:46:43,687-Speed 3310.09 samples/sec Loss 11.9055 LearningRate 0.0908 Epoch: 0 Global Step: 11690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:46:46,748-Speed 3346.42 samples/sec Loss 12.2612 LearningRate 0.0908 Epoch: 0 Global Step: 11700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:46:49,773-Speed 3385.11 samples/sec Loss 11.9351 LearningRate 0.0908 Epoch: 0 Global Step: 11710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:46:52,810-Speed 3373.49 samples/sec Loss 12.1648 LearningRate 0.0908 Epoch: 0 Global Step: 11720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:46:55,828-Speed 3393.86 samples/sec Loss 11.9914 LearningRate 0.0908 Epoch: 0 Global Step: 11730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:46:58,868-Speed 3370.10 samples/sec Loss 12.0221 LearningRate 0.0908 Epoch: 0 Global Step: 11740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:47:01,963-Speed 3309.10 samples/sec Loss 12.0515 LearningRate 0.0908 Epoch: 0 Global Step: 11750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:47:05,032-Speed 3337.51 samples/sec Loss 12.0796 LearningRate 0.0908 Epoch: 0 Global Step: 11760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:47:08,082-Speed 3359.39 samples/sec Loss 12.2045 LearningRate 0.0907 Epoch: 0 Global Step: 11770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:47:11,133-Speed 3356.45 samples/sec Loss 12.0043 LearningRate 0.0907 Epoch: 0 Global Step: 11780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:47:14,164-Speed 3379.95 samples/sec Loss 11.9407 LearningRate 0.0907 Epoch: 0 Global Step: 11790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:47:17,209-Speed 3363.67 samples/sec Loss 11.9971 LearningRate 0.0907 Epoch: 0 Global Step: 11800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:47:20,278-Speed 3337.35 samples/sec Loss 12.1802 LearningRate 0.0907 Epoch: 0 Global Step: 11810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:47:23,319-Speed 3368.72 samples/sec Loss 11.9067 LearningRate 0.0907 Epoch: 0 Global Step: 11820 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:47:26,366-Speed 3361.75 samples/sec Loss 11.9783 LearningRate 0.0907 Epoch: 0 Global Step: 11830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:47:29,443-Speed 3329.44 samples/sec Loss 12.1310 LearningRate 0.0907 Epoch: 0 Global Step: 11840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:47:32,480-Speed 3373.01 samples/sec Loss 12.0366 LearningRate 0.0907 Epoch: 0 Global Step: 11850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:47:35,516-Speed 3373.08 samples/sec Loss 11.9550 LearningRate 0.0907 Epoch: 0 Global Step: 11860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:47:38,555-Speed 3370.63 samples/sec Loss 11.9759 LearningRate 0.0907 Epoch: 0 Global Step: 11870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:47:41,588-Speed 3377.23 samples/sec Loss 11.8795 LearningRate 0.0907 Epoch: 0 Global Step: 11880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:47:44,627-Speed 3370.97 samples/sec Loss 11.9290 LearningRate 0.0907 Epoch: 0 Global Step: 11890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:47:47,659-Speed 3378.92 samples/sec Loss 12.0518 LearningRate 0.0906 Epoch: 0 Global Step: 11900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:47:50,666-Speed 3406.65 samples/sec Loss 11.8489 LearningRate 0.0906 Epoch: 0 Global Step: 11910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:47:53,726-Speed 3346.99 samples/sec Loss 11.9167 LearningRate 0.0906 Epoch: 0 Global Step: 11920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:47:56,787-Speed 3346.46 samples/sec Loss 12.0988 LearningRate 0.0906 Epoch: 0 Global Step: 11930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:47:59,820-Speed 3377.09 samples/sec Loss 11.9657 LearningRate 0.0906 Epoch: 0 Global Step: 11940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:02,845-Speed 3386.63 samples/sec Loss 12.0401 LearningRate 0.0906 Epoch: 0 Global Step: 11950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:05,882-Speed 3373.10 samples/sec Loss 11.7161 LearningRate 0.0906 Epoch: 0 Global Step: 11960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:08,888-Speed 3406.98 samples/sec Loss 12.0802 LearningRate 0.0906 Epoch: 0 Global Step: 11970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:11,953-Speed 3342.34 samples/sec Loss 11.8197 LearningRate 0.0906 Epoch: 0 Global Step: 11980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:15,050-Speed 3307.49 samples/sec Loss 11.9196 LearningRate 0.0906 Epoch: 0 Global Step: 11990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:18,078-Speed 3383.16 samples/sec Loss 11.8651 LearningRate 0.0906 Epoch: 0 Global Step: 12000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:21,105-Speed 3383.58 samples/sec Loss 11.8956 LearningRate 0.0906 Epoch: 0 Global Step: 12010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:24,130-Speed 3387.05 samples/sec Loss 11.9605 LearningRate 0.0906 Epoch: 0 Global Step: 12020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:27,172-Speed 3366.11 samples/sec Loss 11.9452 LearningRate 0.0905 Epoch: 0 Global Step: 12030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:30,246-Speed 3332.77 samples/sec Loss 11.9377 LearningRate 0.0905 Epoch: 0 Global Step: 12040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:33,271-Speed 3386.66 samples/sec Loss 11.8886 LearningRate 0.0905 Epoch: 0 Global Step: 12050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:36,324-Speed 3354.79 samples/sec Loss 11.9496 LearningRate 0.0905 Epoch: 0 Global Step: 12060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:39,357-Speed 3377.53 samples/sec Loss 11.9651 LearningRate 0.0905 Epoch: 0 Global Step: 12070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:42,366-Speed 3403.56 samples/sec Loss 11.8474 LearningRate 0.0905 Epoch: 0 Global Step: 12080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:45,398-Speed 3378.75 samples/sec Loss 11.7866 LearningRate 0.0905 Epoch: 0 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 02:48:48,470-Speed 3334.56 samples/sec Loss 11.7046 LearningRate 0.0905 Epoch: 0 Global Step: 12100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:48:51,518-Speed 3360.55 samples/sec Loss 11.9334 LearningRate 0.0905 Epoch: 0 Global Step: 12110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:48:54,580-Speed 3344.76 samples/sec Loss 11.9109 LearningRate 0.0905 Epoch: 0 Global Step: 12120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:48:57,612-Speed 3378.63 samples/sec Loss 11.7555 LearningRate 0.0905 Epoch: 0 Global Step: 12130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:49:00,663-Speed 3357.97 samples/sec Loss 11.9257 LearningRate 0.0905 Epoch: 0 Global Step: 12140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:49:03,715-Speed 3355.21 samples/sec Loss 11.9934 LearningRate 0.0905 Epoch: 0 Global Step: 12150 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:06,749-Speed 3376.27 samples/sec Loss 11.9373 LearningRate 0.0904 Epoch: 0 Global Step: 12160 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:09,748-Speed 3416.18 samples/sec Loss 11.8597 LearningRate 0.0904 Epoch: 0 Global Step: 12170 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:12,842-Speed 3310.80 samples/sec Loss 11.8656 LearningRate 0.0904 Epoch: 0 Global Step: 12180 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:15,909-Speed 3339.77 samples/sec Loss 11.8597 LearningRate 0.0904 Epoch: 0 Global Step: 12190 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:19,000-Speed 3313.59 samples/sec Loss 11.8205 LearningRate 0.0904 Epoch: 0 Global Step: 12200 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:22,012-Speed 3400.94 samples/sec Loss 11.7629 LearningRate 0.0904 Epoch: 0 Global Step: 12210 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:25,035-Speed 3388.70 samples/sec Loss 11.8096 LearningRate 0.0904 Epoch: 0 Global Step: 12220 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:28,068-Speed 3376.71 samples/sec Loss 11.7176 LearningRate 0.0904 Epoch: 0 Global Step: 12230 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:31,143-Speed 3331.50 samples/sec Loss 11.8382 LearningRate 0.0904 Epoch: 0 Global Step: 12240 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:34,200-Speed 3351.10 samples/sec Loss 11.7799 LearningRate 0.0904 Epoch: 0 Global Step: 12250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:49:37,271-Speed 3335.46 samples/sec Loss 11.7010 LearningRate 0.0904 Epoch: 0 Global Step: 12260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:49:40,357-Speed 3319.09 samples/sec Loss 11.8429 LearningRate 0.0904 Epoch: 0 Global Step: 12270 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:43,396-Speed 3371.44 samples/sec Loss 11.7596 LearningRate 0.0904 Epoch: 0 Global Step: 12280 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:46,414-Speed 3393.78 samples/sec Loss 11.8649 LearningRate 0.0904 Epoch: 0 Global Step: 12290 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:49,487-Speed 3332.79 samples/sec Loss 11.7960 LearningRate 0.0903 Epoch: 0 Global Step: 12300 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:52,518-Speed 3380.27 samples/sec Loss 11.8904 LearningRate 0.0903 Epoch: 0 Global Step: 12310 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:55,535-Speed 3394.87 samples/sec Loss 11.7275 LearningRate 0.0903 Epoch: 0 Global Step: 12320 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:49:58,551-Speed 3396.24 samples/sec Loss 11.7850 LearningRate 0.0903 Epoch: 0 Global Step: 12330 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:50:01,617-Speed 3340.69 samples/sec Loss 11.7975 LearningRate 0.0903 Epoch: 0 Global Step: 12340 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:50:04,670-Speed 3355.73 samples/sec Loss 11.8215 LearningRate 0.0903 Epoch: 0 Global Step: 12350 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:50:07,708-Speed 3371.72 samples/sec Loss 11.7039 LearningRate 0.0903 Epoch: 0 Global Step: 12360 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:50:10,769-Speed 3346.32 samples/sec Loss 11.7376 LearningRate 0.0903 Epoch: 0 Global Step: 12370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:50:13,815-Speed 3362.89 samples/sec Loss 11.8371 LearningRate 0.0903 Epoch: 0 Global Step: 12380 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:50:16,867-Speed 3356.19 samples/sec Loss 11.7800 LearningRate 0.0903 Epoch: 0 Global Step: 12390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:50:19,908-Speed 3368.64 samples/sec Loss 11.8435 LearningRate 0.0903 Epoch: 0 Global Step: 12400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:50:23,127-Speed 3182.06 samples/sec Loss 11.8139 LearningRate 0.0903 Epoch: 0 Global Step: 12410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:50:26,157-Speed 3381.18 samples/sec Loss 11.6703 LearningRate 0.0903 Epoch: 0 Global Step: 12420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:50:57,550-Speed 326.20 samples/sec Loss 10.1249 LearningRate 0.0902 Epoch: 1 Global Step: 12430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:00,758-Speed 3193.70 samples/sec Loss 9.9571 LearningRate 0.0902 Epoch: 1 Global Step: 12440 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:03,785-Speed 3384.11 samples/sec Loss 9.7992 LearningRate 0.0902 Epoch: 1 Global Step: 12450 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:06,813-Speed 3382.44 samples/sec Loss 9.9050 LearningRate 0.0902 Epoch: 1 Global Step: 12460 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:09,825-Speed 3400.78 samples/sec Loss 9.7789 LearningRate 0.0902 Epoch: 1 Global Step: 12470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:51:12,920-Speed 3309.75 samples/sec Loss 9.7056 LearningRate 0.0902 Epoch: 1 Global Step: 12480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:51:15,987-Speed 3340.69 samples/sec Loss 9.7670 LearningRate 0.0902 Epoch: 1 Global Step: 12490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:51:19,075-Speed 3316.73 samples/sec Loss 9.7535 LearningRate 0.0902 Epoch: 1 Global Step: 12500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:51:22,107-Speed 3378.81 samples/sec Loss 9.8595 LearningRate 0.0902 Epoch: 1 Global Step: 12510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:51:25,171-Speed 3342.85 samples/sec Loss 9.8027 LearningRate 0.0902 Epoch: 1 Global Step: 12520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:51:28,256-Speed 3321.07 samples/sec Loss 9.7478 LearningRate 0.0902 Epoch: 1 Global Step: 12530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:51:31,264-Speed 3404.48 samples/sec Loss 9.8406 LearningRate 0.0902 Epoch: 1 Global Step: 12540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:51:34,266-Speed 3412.53 samples/sec Loss 9.9273 LearningRate 0.0902 Epoch: 1 Global Step: 12550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:37,365-Speed 3305.45 samples/sec Loss 9.9473 LearningRate 0.0901 Epoch: 1 Global Step: 12560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:40,426-Speed 3347.02 samples/sec Loss 9.8740 LearningRate 0.0901 Epoch: 1 Global Step: 12570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:43,514-Speed 3317.74 samples/sec Loss 9.8959 LearningRate 0.0901 Epoch: 1 Global Step: 12580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:46,562-Speed 3360.06 samples/sec Loss 9.8226 LearningRate 0.0901 Epoch: 1 Global Step: 12590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:49,598-Speed 3373.88 samples/sec Loss 9.7825 LearningRate 0.0901 Epoch: 1 Global Step: 12600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:52,653-Speed 3352.84 samples/sec Loss 9.8327 LearningRate 0.0901 Epoch: 1 Global Step: 12610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:55,693-Speed 3369.33 samples/sec Loss 9.8523 LearningRate 0.0901 Epoch: 1 Global Step: 12620 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:51:59,319-Speed 2825.06 samples/sec Loss 9.8580 LearningRate 0.0901 Epoch: 1 Global Step: 12630 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:52:02,352-Speed 3377.66 samples/sec Loss 9.8868 LearningRate 0.0901 Epoch: 1 Global Step: 12640 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:52:05,418-Speed 3340.67 samples/sec Loss 9.8720 LearningRate 0.0901 Epoch: 1 Global Step: 12650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:08,424-Speed 3407.16 samples/sec Loss 9.6508 LearningRate 0.0901 Epoch: 1 Global Step: 12660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:11,523-Speed 3305.78 samples/sec Loss 9.8061 LearningRate 0.0901 Epoch: 1 Global Step: 12670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:14,531-Speed 3405.04 samples/sec Loss 10.0826 LearningRate 0.0901 Epoch: 1 Global Step: 12680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:17,590-Speed 3348.53 samples/sec Loss 9.8642 LearningRate 0.0900 Epoch: 1 Global Step: 12690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:20,615-Speed 3385.92 samples/sec Loss 9.9068 LearningRate 0.0900 Epoch: 1 Global Step: 12700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:23,654-Speed 3370.65 samples/sec Loss 9.8238 LearningRate 0.0900 Epoch: 1 Global Step: 12710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:26,812-Speed 3244.17 samples/sec Loss 9.8204 LearningRate 0.0900 Epoch: 1 Global Step: 12720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:29,864-Speed 3356.48 samples/sec Loss 9.6903 LearningRate 0.0900 Epoch: 1 Global Step: 12730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:32,939-Speed 3330.52 samples/sec Loss 9.8839 LearningRate 0.0900 Epoch: 1 Global Step: 12740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:36,040-Speed 3303.82 samples/sec Loss 9.9688 LearningRate 0.0900 Epoch: 1 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 02:52:39,103-Speed 3343.91 samples/sec Loss 9.9751 LearningRate 0.0900 Epoch: 1 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 02:52:42,173-Speed 3337.37 samples/sec Loss 9.9271 LearningRate 0.0900 Epoch: 1 Global Step: 12770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 02:52:45,183-Speed 3402.74 samples/sec Loss 9.7691 LearningRate 0.0900 Epoch: 1 Global Step: 12780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:48,227-Speed 3364.67 samples/sec Loss 9.8620 LearningRate 0.0900 Epoch: 1 Global Step: 12790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:51,331-Speed 3300.34 samples/sec Loss 9.9402 LearningRate 0.0900 Epoch: 1 Global Step: 12800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:54,383-Speed 3356.28 samples/sec Loss 9.9675 LearningRate 0.0900 Epoch: 1 Global Step: 12810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:52:57,407-Speed 3387.43 samples/sec Loss 9.9601 LearningRate 0.0899 Epoch: 1 Global Step: 12820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:53:00,433-Speed 3384.97 samples/sec Loss 9.9712 LearningRate 0.0899 Epoch: 1 Global Step: 12830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:03,497-Speed 3343.85 samples/sec Loss 9.9455 LearningRate 0.0899 Epoch: 1 Global Step: 12840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:06,547-Speed 3358.47 samples/sec Loss 9.9507 LearningRate 0.0899 Epoch: 1 Global Step: 12850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:09,561-Speed 3397.84 samples/sec Loss 9.9747 LearningRate 0.0899 Epoch: 1 Global Step: 12860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:12,610-Speed 3360.11 samples/sec Loss 9.9862 LearningRate 0.0899 Epoch: 1 Global Step: 12870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:15,640-Speed 3380.17 samples/sec Loss 9.9928 LearningRate 0.0899 Epoch: 1 Global Step: 12880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:18,666-Speed 3385.48 samples/sec Loss 10.0239 LearningRate 0.0899 Epoch: 1 Global Step: 12890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:21,673-Speed 3406.39 samples/sec Loss 9.7993 LearningRate 0.0899 Epoch: 1 Global Step: 12900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:24,717-Speed 3365.66 samples/sec Loss 9.9540 LearningRate 0.0899 Epoch: 1 Global Step: 12910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:27,742-Speed 3385.11 samples/sec Loss 9.8881 LearningRate 0.0899 Epoch: 1 Global Step: 12920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:30,798-Speed 3352.39 samples/sec Loss 9.8813 LearningRate 0.0899 Epoch: 1 Global Step: 12930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:33,826-Speed 3382.92 samples/sec Loss 9.9039 LearningRate 0.0899 Epoch: 1 Global Step: 12940 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:36,938-Speed 3291.31 samples/sec Loss 9.8866 LearningRate 0.0898 Epoch: 1 Global Step: 12950 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:40,059-Speed 3282.62 samples/sec Loss 9.9019 LearningRate 0.0898 Epoch: 1 Global Step: 12960 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:43,166-Speed 3295.72 samples/sec Loss 9.8631 LearningRate 0.0898 Epoch: 1 Global Step: 12970 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:46,215-Speed 3359.49 samples/sec Loss 9.9203 LearningRate 0.0898 Epoch: 1 Global Step: 12980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:49,298-Speed 3323.14 samples/sec Loss 10.0089 LearningRate 0.0898 Epoch: 1 Global Step: 12990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:52,381-Speed 3322.98 samples/sec Loss 9.9257 LearningRate 0.0898 Epoch: 1 Global Step: 13000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:55,436-Speed 3352.56 samples/sec Loss 9.9727 LearningRate 0.0898 Epoch: 1 Global Step: 13010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:53:58,523-Speed 3317.68 samples/sec Loss 9.9306 LearningRate 0.0898 Epoch: 1 Global Step: 13020 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:54:01,577-Speed 3354.08 samples/sec Loss 10.0393 LearningRate 0.0898 Epoch: 1 Global Step: 13030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:04,642-Speed 3342.25 samples/sec Loss 9.9127 LearningRate 0.0898 Epoch: 1 Global Step: 13040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:07,670-Speed 3383.82 samples/sec Loss 10.0274 LearningRate 0.0898 Epoch: 1 Global Step: 13050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:10,691-Speed 3389.87 samples/sec Loss 9.9538 LearningRate 0.0898 Epoch: 1 Global Step: 13060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:13,760-Speed 3337.58 samples/sec Loss 10.0113 LearningRate 0.0898 Epoch: 1 Global Step: 13070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:16,842-Speed 3323.67 samples/sec Loss 9.9796 LearningRate 0.0897 Epoch: 1 Global Step: 13080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:19,872-Speed 3380.35 samples/sec Loss 9.8919 LearningRate 0.0897 Epoch: 1 Global Step: 13090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:22,919-Speed 3362.96 samples/sec Loss 10.0215 LearningRate 0.0897 Epoch: 1 Global Step: 13100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:25,997-Speed 3327.43 samples/sec Loss 10.1007 LearningRate 0.0897 Epoch: 1 Global Step: 13110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:29,051-Speed 3354.00 samples/sec Loss 10.0498 LearningRate 0.0897 Epoch: 1 Global Step: 13120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:32,058-Speed 3406.84 samples/sec Loss 9.9810 LearningRate 0.0897 Epoch: 1 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 02:54:35,067-Speed 3404.18 samples/sec Loss 10.1064 LearningRate 0.0897 Epoch: 1 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 02:54:38,075-Speed 3404.64 samples/sec Loss 10.0257 LearningRate 0.0897 Epoch: 1 Global Step: 13150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:41,127-Speed 3356.23 samples/sec Loss 10.0386 LearningRate 0.0897 Epoch: 1 Global Step: 13160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:44,150-Speed 3389.12 samples/sec Loss 10.0075 LearningRate 0.0897 Epoch: 1 Global Step: 13170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:54:47,147-Speed 3417.32 samples/sec Loss 10.0902 LearningRate 0.0897 Epoch: 1 Global Step: 13180 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:54:50,171-Speed 3387.53 samples/sec Loss 9.9751 LearningRate 0.0897 Epoch: 1 Global Step: 13190 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:54:53,221-Speed 3358.39 samples/sec Loss 10.0921 LearningRate 0.0897 Epoch: 1 Global Step: 13200 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:54:56,237-Speed 3395.53 samples/sec Loss 10.0470 LearningRate 0.0896 Epoch: 1 Global Step: 13210 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:54:59,247-Speed 3403.93 samples/sec Loss 10.0618 LearningRate 0.0896 Epoch: 1 Global Step: 13220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:02,259-Speed 3400.63 samples/sec Loss 10.1569 LearningRate 0.0896 Epoch: 1 Global Step: 13230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:05,272-Speed 3399.37 samples/sec Loss 10.1316 LearningRate 0.0896 Epoch: 1 Global Step: 13240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:08,324-Speed 3356.98 samples/sec Loss 10.0519 LearningRate 0.0896 Epoch: 1 Global Step: 13250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:11,430-Speed 3297.48 samples/sec Loss 10.0208 LearningRate 0.0896 Epoch: 1 Global Step: 13260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:14,471-Speed 3368.08 samples/sec Loss 10.2316 LearningRate 0.0896 Epoch: 1 Global Step: 13270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:17,535-Speed 3343.50 samples/sec Loss 10.1318 LearningRate 0.0896 Epoch: 1 Global Step: 13280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:55:20,558-Speed 3388.36 samples/sec Loss 10.0415 LearningRate 0.0896 Epoch: 1 Global Step: 13290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:55:23,626-Speed 3338.25 samples/sec Loss 9.9797 LearningRate 0.0896 Epoch: 1 Global Step: 13300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:55:26,624-Speed 3417.35 samples/sec Loss 10.0599 LearningRate 0.0896 Epoch: 1 Global Step: 13310 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:29,750-Speed 3276.57 samples/sec Loss 10.1217 LearningRate 0.0896 Epoch: 1 Global Step: 13320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:32,815-Speed 3341.77 samples/sec Loss 10.1054 LearningRate 0.0896 Epoch: 1 Global Step: 13330 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:35,874-Speed 3349.17 samples/sec Loss 10.0688 LearningRate 0.0895 Epoch: 1 Global Step: 13340 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:38,911-Speed 3372.20 samples/sec Loss 9.9507 LearningRate 0.0895 Epoch: 1 Global Step: 13350 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:41,971-Speed 3348.23 samples/sec Loss 10.1096 LearningRate 0.0895 Epoch: 1 Global Step: 13360 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:44,982-Speed 3401.92 samples/sec Loss 10.0131 LearningRate 0.0895 Epoch: 1 Global Step: 13370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:48,064-Speed 3323.84 samples/sec Loss 10.1405 LearningRate 0.0895 Epoch: 1 Global Step: 13380 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:51,114-Speed 3358.07 samples/sec Loss 10.1100 LearningRate 0.0895 Epoch: 1 Global Step: 13390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:54,215-Speed 3303.13 samples/sec Loss 10.1454 LearningRate 0.0895 Epoch: 1 Global Step: 13400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:55:57,234-Speed 3393.44 samples/sec Loss 10.1947 LearningRate 0.0895 Epoch: 1 Global Step: 13410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:56:00,254-Speed 3391.83 samples/sec Loss 10.0970 LearningRate 0.0895 Epoch: 1 Global Step: 13420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:03,360-Speed 3297.42 samples/sec Loss 10.1928 LearningRate 0.0895 Epoch: 1 Global Step: 13430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:06,461-Speed 3303.85 samples/sec Loss 10.1126 LearningRate 0.0895 Epoch: 1 Global Step: 13440 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:09,477-Speed 3395.43 samples/sec Loss 10.1083 LearningRate 0.0895 Epoch: 1 Global Step: 13450 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:12,532-Speed 3353.66 samples/sec Loss 10.0576 LearningRate 0.0895 Epoch: 1 Global Step: 13460 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:15,639-Speed 3295.91 samples/sec Loss 10.1203 LearningRate 0.0894 Epoch: 1 Global Step: 13470 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:18,678-Speed 3370.51 samples/sec Loss 10.0982 LearningRate 0.0894 Epoch: 1 Global Step: 13480 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:21,724-Speed 3363.24 samples/sec Loss 10.2324 LearningRate 0.0894 Epoch: 1 Global Step: 13490 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:24,755-Speed 3379.06 samples/sec Loss 9.9822 LearningRate 0.0894 Epoch: 1 Global Step: 13500 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:27,768-Speed 3400.05 samples/sec Loss 9.9879 LearningRate 0.0894 Epoch: 1 Global Step: 13510 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:30,853-Speed 3319.75 samples/sec Loss 10.1077 LearningRate 0.0894 Epoch: 1 Global Step: 13520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:56:33,855-Speed 3411.70 samples/sec Loss 10.1903 LearningRate 0.0894 Epoch: 1 Global Step: 13530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:56:36,858-Speed 3411.33 samples/sec Loss 10.1178 LearningRate 0.0894 Epoch: 1 Global Step: 13540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:56:39,904-Speed 3363.15 samples/sec Loss 10.0807 LearningRate 0.0894 Epoch: 1 Global Step: 13550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:56:42,956-Speed 3355.93 samples/sec Loss 10.1764 LearningRate 0.0894 Epoch: 1 Global Step: 13560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:56:45,994-Speed 3372.15 samples/sec Loss 10.1327 LearningRate 0.0894 Epoch: 1 Global Step: 13570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:56:48,996-Speed 3412.23 samples/sec Loss 10.1625 LearningRate 0.0894 Epoch: 1 Global Step: 13580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:52,068-Speed 3334.83 samples/sec Loss 10.1301 LearningRate 0.0894 Epoch: 1 Global Step: 13590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:55,140-Speed 3334.12 samples/sec Loss 10.1555 LearningRate 0.0894 Epoch: 1 Global Step: 13600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:56:58,157-Speed 3395.28 samples/sec Loss 10.1935 LearningRate 0.0893 Epoch: 1 Global Step: 13610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:01,164-Speed 3406.14 samples/sec Loss 10.0903 LearningRate 0.0893 Epoch: 1 Global Step: 13620 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:04,248-Speed 3321.76 samples/sec Loss 10.0839 LearningRate 0.0893 Epoch: 1 Global Step: 13630 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:07,256-Speed 3405.43 samples/sec Loss 10.3108 LearningRate 0.0893 Epoch: 1 Global Step: 13640 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:10,254-Speed 3416.13 samples/sec Loss 10.1591 LearningRate 0.0893 Epoch: 1 Global Step: 13650 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:13,332-Speed 3327.90 samples/sec Loss 10.1506 LearningRate 0.0893 Epoch: 1 Global Step: 13660 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:16,403-Speed 3335.99 samples/sec Loss 10.1664 LearningRate 0.0893 Epoch: 1 Global Step: 13670 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:19,437-Speed 3375.28 samples/sec Loss 10.1646 LearningRate 0.0893 Epoch: 1 Global Step: 13680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:57:22,440-Speed 3411.23 samples/sec Loss 10.0410 LearningRate 0.0893 Epoch: 1 Global Step: 13690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:57:25,548-Speed 3296.27 samples/sec Loss 10.1659 LearningRate 0.0893 Epoch: 1 Global Step: 13700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:57:28,592-Speed 3364.81 samples/sec Loss 10.1605 LearningRate 0.0893 Epoch: 1 Global Step: 13710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:57:31,658-Speed 3341.01 samples/sec Loss 10.0468 LearningRate 0.0893 Epoch: 1 Global Step: 13720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:57:34,658-Speed 3414.76 samples/sec Loss 10.2328 LearningRate 0.0893 Epoch: 1 Global Step: 13730 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:37,725-Speed 3339.92 samples/sec Loss 10.2348 LearningRate 0.0892 Epoch: 1 Global Step: 13740 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:40,835-Speed 3292.57 samples/sec Loss 10.2378 LearningRate 0.0892 Epoch: 1 Global Step: 13750 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:43,867-Speed 3378.87 samples/sec Loss 10.0615 LearningRate 0.0892 Epoch: 1 Global Step: 13760 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:46,889-Speed 3389.43 samples/sec Loss 10.1832 LearningRate 0.0892 Epoch: 1 Global Step: 13770 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:49,970-Speed 3324.38 samples/sec Loss 10.2257 LearningRate 0.0892 Epoch: 1 Global Step: 13780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:53,057-Speed 3318.80 samples/sec Loss 10.2686 LearningRate 0.0892 Epoch: 1 Global Step: 13790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:56,160-Speed 3301.58 samples/sec Loss 10.1043 LearningRate 0.0892 Epoch: 1 Global Step: 13800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:57:59,154-Speed 3420.48 samples/sec Loss 10.1802 LearningRate 0.0892 Epoch: 1 Global Step: 13810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:58:02,170-Speed 3396.93 samples/sec Loss 10.2677 LearningRate 0.0892 Epoch: 1 Global Step: 13820 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:58:05,240-Speed 3336.99 samples/sec Loss 10.1921 LearningRate 0.0892 Epoch: 1 Global Step: 13830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:08,249-Speed 3403.66 samples/sec Loss 10.2963 LearningRate 0.0892 Epoch: 1 Global Step: 13840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:11,279-Speed 3381.39 samples/sec Loss 10.1482 LearningRate 0.0892 Epoch: 1 Global Step: 13850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:14,308-Speed 3382.07 samples/sec Loss 10.2864 LearningRate 0.0892 Epoch: 1 Global Step: 13860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:17,317-Speed 3404.14 samples/sec Loss 10.1644 LearningRate 0.0891 Epoch: 1 Global Step: 13870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:20,330-Speed 3399.54 samples/sec Loss 10.3866 LearningRate 0.0891 Epoch: 1 Global Step: 13880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:23,334-Speed 3409.83 samples/sec Loss 10.2615 LearningRate 0.0891 Epoch: 1 Global Step: 13890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:26,367-Speed 3377.89 samples/sec Loss 10.2310 LearningRate 0.0891 Epoch: 1 Global Step: 13900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:29,386-Speed 3393.16 samples/sec Loss 10.2300 LearningRate 0.0891 Epoch: 1 Global Step: 13910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:32,417-Speed 3379.98 samples/sec Loss 10.1876 LearningRate 0.0891 Epoch: 1 Global Step: 13920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:35,420-Speed 3411.09 samples/sec Loss 10.0571 LearningRate 0.0891 Epoch: 1 Global Step: 13930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:38,451-Speed 3379.41 samples/sec Loss 10.2094 LearningRate 0.0891 Epoch: 1 Global Step: 13940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:58:41,467-Speed 3396.01 samples/sec Loss 10.2267 LearningRate 0.0891 Epoch: 1 Global Step: 13950 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:58:44,476-Speed 3404.52 samples/sec Loss 10.2696 LearningRate 0.0891 Epoch: 1 Global Step: 13960 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:58:47,496-Speed 3391.88 samples/sec Loss 10.1447 LearningRate 0.0891 Epoch: 1 Global Step: 13970 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:58:50,611-Speed 3288.77 samples/sec Loss 10.1094 LearningRate 0.0891 Epoch: 1 Global Step: 13980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:58:53,674-Speed 3344.33 samples/sec Loss 10.1038 LearningRate 0.0891 Epoch: 1 Global Step: 13990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:58:56,667-Speed 3422.11 samples/sec Loss 10.1262 LearningRate 0.0890 Epoch: 1 Global Step: 14000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:58:59,688-Speed 3390.88 samples/sec Loss 10.3112 LearningRate 0.0890 Epoch: 1 Global Step: 14010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:59:02,692-Speed 3409.81 samples/sec Loss 10.2551 LearningRate 0.0890 Epoch: 1 Global Step: 14020 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:59:05,781-Speed 3315.65 samples/sec Loss 10.1968 LearningRate 0.0890 Epoch: 1 Global Step: 14030 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:59:08,790-Speed 3404.89 samples/sec Loss 10.1082 LearningRate 0.0890 Epoch: 1 Global Step: 14040 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:59:11,920-Speed 3272.74 samples/sec Loss 10.3176 LearningRate 0.0890 Epoch: 1 Global Step: 14050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:59:14,943-Speed 3387.79 samples/sec Loss 10.2013 LearningRate 0.0890 Epoch: 1 Global Step: 14060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:59:18,034-Speed 3314.72 samples/sec Loss 10.1706 LearningRate 0.0890 Epoch: 1 Global Step: 14070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:59:21,042-Speed 3404.86 samples/sec Loss 10.0774 LearningRate 0.0890 Epoch: 1 Global Step: 14080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:59:24,136-Speed 3310.83 samples/sec Loss 10.1528 LearningRate 0.0890 Epoch: 1 Global Step: 14090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 02:59:27,178-Speed 3366.62 samples/sec Loss 10.2464 LearningRate 0.0890 Epoch: 1 Global Step: 14100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 02:59:30,223-Speed 3364.88 samples/sec Loss 10.1824 LearningRate 0.0890 Epoch: 1 Global Step: 14110 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:59:33,236-Speed 3398.98 samples/sec Loss 10.2032 LearningRate 0.0890 Epoch: 1 Global Step: 14120 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:59:36,284-Speed 3361.31 samples/sec Loss 10.1896 LearningRate 0.0889 Epoch: 1 Global Step: 14130 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:59:39,342-Speed 3349.05 samples/sec Loss 10.0750 LearningRate 0.0889 Epoch: 1 Global Step: 14140 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:59:42,445-Speed 3301.10 samples/sec Loss 10.1409 LearningRate 0.0889 Epoch: 1 Global Step: 14150 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:59:45,473-Speed 3383.31 samples/sec Loss 10.3200 LearningRate 0.0889 Epoch: 1 Global Step: 14160 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:59:48,616-Speed 3259.18 samples/sec Loss 10.2157 LearningRate 0.0889 Epoch: 1 Global Step: 14170 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:59:51,728-Speed 3291.04 samples/sec Loss 10.2167 LearningRate 0.0889 Epoch: 1 Global Step: 14180 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:59:54,777-Speed 3359.99 samples/sec Loss 10.2105 LearningRate 0.0889 Epoch: 1 Global Step: 14190 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 02:59:57,828-Speed 3356.78 samples/sec Loss 10.3305 LearningRate 0.0889 Epoch: 1 Global Step: 14200 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:00:00,905-Speed 3329.59 samples/sec Loss 10.2161 LearningRate 0.0889 Epoch: 1 Global Step: 14210 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:03,922-Speed 3394.95 samples/sec Loss 10.3057 LearningRate 0.0889 Epoch: 1 Global Step: 14220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:06,921-Speed 3415.45 samples/sec Loss 10.3241 LearningRate 0.0889 Epoch: 1 Global Step: 14230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:09,917-Speed 3418.60 samples/sec Loss 10.3498 LearningRate 0.0889 Epoch: 1 Global Step: 14240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:12,981-Speed 3342.99 samples/sec Loss 10.2199 LearningRate 0.0889 Epoch: 1 Global Step: 14250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:16,056-Speed 3331.90 samples/sec Loss 10.2588 LearningRate 0.0888 Epoch: 1 Global Step: 14260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:19,076-Speed 3391.02 samples/sec Loss 10.2217 LearningRate 0.0888 Epoch: 1 Global Step: 14270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:22,096-Speed 3392.85 samples/sec Loss 10.3678 LearningRate 0.0888 Epoch: 1 Global Step: 14280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:25,137-Speed 3368.19 samples/sec Loss 10.1835 LearningRate 0.0888 Epoch: 1 Global Step: 14290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:28,173-Speed 3373.32 samples/sec Loss 10.1455 LearningRate 0.0888 Epoch: 1 Global Step: 14300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:31,209-Speed 3374.16 samples/sec Loss 10.2944 LearningRate 0.0888 Epoch: 1 Global Step: 14310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:00:34,226-Speed 3395.95 samples/sec Loss 10.1883 LearningRate 0.0888 Epoch: 1 Global Step: 14320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:00:37,272-Speed 3361.77 samples/sec Loss 10.1424 LearningRate 0.0888 Epoch: 1 Global Step: 14330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:00:40,288-Speed 3397.44 samples/sec Loss 10.1891 LearningRate 0.0888 Epoch: 1 Global Step: 14340 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:43,333-Speed 3363.62 samples/sec Loss 10.3423 LearningRate 0.0888 Epoch: 1 Global Step: 14350 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:46,417-Speed 3321.79 samples/sec Loss 10.2224 LearningRate 0.0888 Epoch: 1 Global Step: 14360 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:49,512-Speed 3309.48 samples/sec Loss 10.2652 LearningRate 0.0888 Epoch: 1 Global Step: 14370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:52,534-Speed 3390.10 samples/sec Loss 10.2227 LearningRate 0.0888 Epoch: 1 Global Step: 14380 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:55,651-Speed 3286.27 samples/sec Loss 10.1913 LearningRate 0.0888 Epoch: 1 Global Step: 14390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:00:58,660-Speed 3403.89 samples/sec Loss 10.2150 LearningRate 0.0887 Epoch: 1 Global Step: 14400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:01,687-Speed 3383.85 samples/sec Loss 10.2038 LearningRate 0.0887 Epoch: 1 Global Step: 14410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:04,766-Speed 3327.09 samples/sec Loss 10.4094 LearningRate 0.0887 Epoch: 1 Global Step: 14420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:07,846-Speed 3326.11 samples/sec Loss 10.1751 LearningRate 0.0887 Epoch: 1 Global Step: 14430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:10,841-Speed 3420.27 samples/sec Loss 10.2826 LearningRate 0.0887 Epoch: 1 Global Step: 14440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:01:13,900-Speed 3348.09 samples/sec Loss 10.1649 LearningRate 0.0887 Epoch: 1 Global Step: 14450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:01:16,994-Speed 3310.77 samples/sec Loss 10.2125 LearningRate 0.0887 Epoch: 1 Global Step: 14460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:01:19,987-Speed 3423.06 samples/sec Loss 10.2554 LearningRate 0.0887 Epoch: 1 Global Step: 14470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:01:23,021-Speed 3375.59 samples/sec Loss 10.2210 LearningRate 0.0887 Epoch: 1 Global Step: 14480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:01:26,063-Speed 3367.47 samples/sec Loss 10.1647 LearningRate 0.0887 Epoch: 1 Global Step: 14490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:01:29,094-Speed 3380.05 samples/sec Loss 10.2298 LearningRate 0.0887 Epoch: 1 Global Step: 14500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:01:32,116-Speed 3389.76 samples/sec Loss 10.1696 LearningRate 0.0887 Epoch: 1 Global Step: 14510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:01:35,137-Speed 3390.40 samples/sec Loss 10.2873 LearningRate 0.0887 Epoch: 1 Global Step: 14520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:38,167-Speed 3380.32 samples/sec Loss 10.3623 LearningRate 0.0886 Epoch: 1 Global Step: 14530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:41,171-Speed 3409.96 samples/sec Loss 10.2610 LearningRate 0.0886 Epoch: 1 Global Step: 14540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:44,231-Speed 3348.05 samples/sec Loss 10.2780 LearningRate 0.0886 Epoch: 1 Global Step: 14550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:47,263-Speed 3378.74 samples/sec Loss 10.2845 LearningRate 0.0886 Epoch: 1 Global Step: 14560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:50,339-Speed 3329.45 samples/sec Loss 10.2873 LearningRate 0.0886 Epoch: 1 Global Step: 14570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:53,418-Speed 3327.02 samples/sec Loss 10.3081 LearningRate 0.0886 Epoch: 1 Global Step: 14580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:56,478-Speed 3347.10 samples/sec Loss 10.2989 LearningRate 0.0886 Epoch: 1 Global Step: 14590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:01:59,506-Speed 3382.80 samples/sec Loss 10.3583 LearningRate 0.0886 Epoch: 1 Global Step: 14600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:02,562-Speed 3352.76 samples/sec Loss 10.3106 LearningRate 0.0886 Epoch: 1 Global Step: 14610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:05,644-Speed 3322.64 samples/sec Loss 10.1864 LearningRate 0.0886 Epoch: 1 Global Step: 14620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:02:08,687-Speed 3366.37 samples/sec Loss 10.2108 LearningRate 0.0886 Epoch: 1 Global Step: 14630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:02:11,749-Speed 3346.02 samples/sec Loss 10.3418 LearningRate 0.0886 Epoch: 1 Global Step: 14640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:02:14,805-Speed 3351.87 samples/sec Loss 10.2877 LearningRate 0.0886 Epoch: 1 Global Step: 14650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:02:17,887-Speed 3323.03 samples/sec Loss 10.3846 LearningRate 0.0885 Epoch: 1 Global Step: 14660 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:20,927-Speed 3370.38 samples/sec Loss 10.2861 LearningRate 0.0885 Epoch: 1 Global Step: 14670 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:23,968-Speed 3367.57 samples/sec Loss 10.3300 LearningRate 0.0885 Epoch: 1 Global Step: 14680 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:27,044-Speed 3329.92 samples/sec Loss 10.1424 LearningRate 0.0885 Epoch: 1 Global Step: 14690 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:30,056-Speed 3400.65 samples/sec Loss 10.2704 LearningRate 0.0885 Epoch: 1 Global Step: 14700 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:33,108-Speed 3356.88 samples/sec Loss 10.3005 LearningRate 0.0885 Epoch: 1 Global Step: 14710 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:36,145-Speed 3373.23 samples/sec Loss 10.1777 LearningRate 0.0885 Epoch: 1 Global Step: 14720 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:39,191-Speed 3362.10 samples/sec Loss 10.2784 LearningRate 0.0885 Epoch: 1 Global Step: 14730 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:42,259-Speed 3339.43 samples/sec Loss 10.2599 LearningRate 0.0885 Epoch: 1 Global Step: 14740 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:45,288-Speed 3381.08 samples/sec Loss 10.1637 LearningRate 0.0885 Epoch: 1 Global Step: 14750 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:02:48,365-Speed 3328.73 samples/sec Loss 10.2520 LearningRate 0.0885 Epoch: 1 Global Step: 14760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:02:51,476-Speed 3292.89 samples/sec Loss 10.2530 LearningRate 0.0885 Epoch: 1 Global Step: 14770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:02:54,530-Speed 3353.63 samples/sec Loss 10.2616 LearningRate 0.0885 Epoch: 1 Global Step: 14780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:02:57,533-Speed 3411.61 samples/sec Loss 10.3973 LearningRate 0.0884 Epoch: 1 Global Step: 14790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:03:00,557-Speed 3388.06 samples/sec Loss 10.2556 LearningRate 0.0884 Epoch: 1 Global Step: 14800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:03:03,602-Speed 3364.21 samples/sec Loss 10.1874 LearningRate 0.0884 Epoch: 1 Global Step: 14810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:03:06,662-Speed 3347.49 samples/sec Loss 10.3055 LearningRate 0.0884 Epoch: 1 Global Step: 14820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:03:09,686-Speed 3387.45 samples/sec Loss 10.3328 LearningRate 0.0884 Epoch: 1 Global Step: 14830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:03:12,756-Speed 3336.45 samples/sec Loss 10.2133 LearningRate 0.0884 Epoch: 1 Global Step: 14840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:03:15,839-Speed 3322.37 samples/sec Loss 10.1728 LearningRate 0.0884 Epoch: 1 Global Step: 14850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:03:18,893-Speed 3354.33 samples/sec Loss 10.3326 LearningRate 0.0884 Epoch: 1 Global Step: 14860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:03:21,906-Speed 3399.79 samples/sec Loss 10.2999 LearningRate 0.0884 Epoch: 1 Global Step: 14870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:03:24,935-Speed 3381.55 samples/sec Loss 10.3785 LearningRate 0.0884 Epoch: 1 Global Step: 14880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:03:27,993-Speed 3349.84 samples/sec Loss 10.1486 LearningRate 0.0884 Epoch: 1 Global Step: 14890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:03:31,061-Speed 3338.71 samples/sec Loss 10.2041 LearningRate 0.0884 Epoch: 1 Global Step: 14900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:03:34,104-Speed 3365.91 samples/sec Loss 10.2141 LearningRate 0.0884 Epoch: 1 Global Step: 14910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:03:37,162-Speed 3350.49 samples/sec Loss 10.3192 LearningRate 0.0883 Epoch: 1 Global Step: 14920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:03:40,257-Speed 3311.30 samples/sec Loss 10.2153 LearningRate 0.0883 Epoch: 1 Global Step: 14930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:03:43,326-Speed 3337.97 samples/sec Loss 10.2284 LearningRate 0.0883 Epoch: 1 Global Step: 14940 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:03:46,342-Speed 3395.38 samples/sec Loss 10.2223 LearningRate 0.0883 Epoch: 1 Global Step: 14950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:03:49,388-Speed 3363.51 samples/sec Loss 10.2949 LearningRate 0.0883 Epoch: 1 Global Step: 14960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:03:52,498-Speed 3293.48 samples/sec Loss 10.2119 LearningRate 0.0883 Epoch: 1 Global Step: 14970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:03:55,551-Speed 3355.76 samples/sec Loss 10.3233 LearningRate 0.0883 Epoch: 1 Global Step: 14980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:03:58,597-Speed 3362.76 samples/sec Loss 10.2305 LearningRate 0.0883 Epoch: 1 Global Step: 14990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:01,689-Speed 3312.54 samples/sec Loss 10.2978 LearningRate 0.0883 Epoch: 1 Global Step: 15000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:04,728-Speed 3370.44 samples/sec Loss 10.1676 LearningRate 0.0883 Epoch: 1 Global Step: 15010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:07,759-Speed 3379.83 samples/sec Loss 10.2325 LearningRate 0.0883 Epoch: 1 Global Step: 15020 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:10,803-Speed 3365.47 samples/sec Loss 10.1828 LearningRate 0.0883 Epoch: 1 Global Step: 15030 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:13,845-Speed 3367.44 samples/sec Loss 10.1812 LearningRate 0.0883 Epoch: 1 Global Step: 15040 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:16,852-Speed 3406.19 samples/sec Loss 10.2228 LearningRate 0.0883 Epoch: 1 Global Step: 15050 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:19,900-Speed 3360.94 samples/sec Loss 10.2770 LearningRate 0.0882 Epoch: 1 Global Step: 15060 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:22,903-Speed 3411.30 samples/sec Loss 10.4429 LearningRate 0.0882 Epoch: 1 Global Step: 15070 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:25,912-Speed 3403.77 samples/sec Loss 10.2028 LearningRate 0.0882 Epoch: 1 Global Step: 15080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:04:28,966-Speed 3353.88 samples/sec Loss 10.2797 LearningRate 0.0882 Epoch: 1 Global Step: 15090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:04:32,046-Speed 3326.45 samples/sec Loss 10.1818 LearningRate 0.0882 Epoch: 1 Global Step: 15100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:04:35,038-Speed 3423.32 samples/sec Loss 10.1534 LearningRate 0.0882 Epoch: 1 Global Step: 15110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:38,085-Speed 3362.10 samples/sec Loss 10.2732 LearningRate 0.0882 Epoch: 1 Global Step: 15120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:41,119-Speed 3375.29 samples/sec Loss 10.1318 LearningRate 0.0882 Epoch: 1 Global Step: 15130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:44,116-Speed 3418.36 samples/sec Loss 10.2502 LearningRate 0.0882 Epoch: 1 Global Step: 15140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:47,109-Speed 3422.20 samples/sec Loss 10.3078 LearningRate 0.0882 Epoch: 1 Global Step: 15150 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:50,170-Speed 3346.99 samples/sec Loss 10.4196 LearningRate 0.0882 Epoch: 1 Global Step: 15160 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:53,196-Speed 3384.78 samples/sec Loss 10.2420 LearningRate 0.0882 Epoch: 1 Global Step: 15170 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:56,237-Speed 3367.76 samples/sec Loss 10.2390 LearningRate 0.0882 Epoch: 1 Global Step: 15180 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:04:59,268-Speed 3379.69 samples/sec Loss 10.1995 LearningRate 0.0881 Epoch: 1 Global Step: 15190 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:02,312-Speed 3365.91 samples/sec Loss 10.3231 LearningRate 0.0881 Epoch: 1 Global Step: 15200 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:05,339-Speed 3383.69 samples/sec Loss 10.3229 LearningRate 0.0881 Epoch: 1 Global Step: 15210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:05:08,370-Speed 3378.87 samples/sec Loss 10.3240 LearningRate 0.0881 Epoch: 1 Global Step: 15220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:05:11,394-Speed 3387.76 samples/sec Loss 10.2410 LearningRate 0.0881 Epoch: 1 Global Step: 15230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:05:14,457-Speed 3344.76 samples/sec Loss 10.3423 LearningRate 0.0881 Epoch: 1 Global Step: 15240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:17,499-Speed 3366.56 samples/sec Loss 10.2559 LearningRate 0.0881 Epoch: 1 Global Step: 15250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:20,536-Speed 3373.23 samples/sec Loss 10.2011 LearningRate 0.0881 Epoch: 1 Global Step: 15260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:23,562-Speed 3385.10 samples/sec Loss 10.1502 LearningRate 0.0881 Epoch: 1 Global Step: 15270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:26,566-Speed 3409.22 samples/sec Loss 10.1602 LearningRate 0.0881 Epoch: 1 Global Step: 15280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:29,583-Speed 3395.47 samples/sec Loss 10.1859 LearningRate 0.0881 Epoch: 1 Global Step: 15290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:32,640-Speed 3350.96 samples/sec Loss 10.2330 LearningRate 0.0881 Epoch: 1 Global Step: 15300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:35,687-Speed 3362.28 samples/sec Loss 10.3067 LearningRate 0.0881 Epoch: 1 Global Step: 15310 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:38,730-Speed 3365.81 samples/sec Loss 10.3078 LearningRate 0.0880 Epoch: 1 Global Step: 15320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:41,737-Speed 3406.06 samples/sec Loss 10.2875 LearningRate 0.0880 Epoch: 1 Global Step: 15330 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:05:44,753-Speed 3396.34 samples/sec Loss 10.2802 LearningRate 0.0880 Epoch: 1 Global Step: 15340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:05:47,756-Speed 3410.59 samples/sec Loss 10.2629 LearningRate 0.0880 Epoch: 1 Global Step: 15350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:05:50,770-Speed 3399.19 samples/sec Loss 10.3866 LearningRate 0.0880 Epoch: 1 Global Step: 15360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:05:53,783-Speed 3399.35 samples/sec Loss 10.1965 LearningRate 0.0880 Epoch: 1 Global Step: 15370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:05:56,789-Speed 3407.70 samples/sec Loss 10.2950 LearningRate 0.0880 Epoch: 1 Global Step: 15380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:05:59,781-Speed 3423.15 samples/sec Loss 10.2959 LearningRate 0.0880 Epoch: 1 Global Step: 15390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:06:02,821-Speed 3369.43 samples/sec Loss 10.1918 LearningRate 0.0880 Epoch: 1 Global Step: 15400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:06:05,862-Speed 3369.39 samples/sec Loss 10.2551 LearningRate 0.0880 Epoch: 1 Global Step: 15410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:06:08,879-Speed 3395.04 samples/sec Loss 10.3523 LearningRate 0.0880 Epoch: 1 Global Step: 15420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:06:11,905-Speed 3384.44 samples/sec Loss 10.1363 LearningRate 0.0880 Epoch: 1 Global Step: 15430 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:06:14,973-Speed 3339.66 samples/sec Loss 10.3114 LearningRate 0.0880 Epoch: 1 Global Step: 15440 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:06:18,051-Speed 3327.83 samples/sec Loss 10.2895 LearningRate 0.0879 Epoch: 1 Global Step: 15450 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:06:21,060-Speed 3403.89 samples/sec Loss 10.1743 LearningRate 0.0879 Epoch: 1 Global Step: 15460 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:06:24,136-Speed 3330.47 samples/sec Loss 10.2273 LearningRate 0.0879 Epoch: 1 Global Step: 15470 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:06:27,221-Speed 3319.99 samples/sec Loss 10.1743 LearningRate 0.0879 Epoch: 1 Global Step: 15480 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:06:30,359-Speed 3264.50 samples/sec Loss 10.1871 LearningRate 0.0879 Epoch: 1 Global Step: 15490 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:06:33,362-Speed 3411.43 samples/sec Loss 10.2363 LearningRate 0.0879 Epoch: 1 Global Step: 15500 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:06:36,413-Speed 3357.45 samples/sec Loss 10.4052 LearningRate 0.0879 Epoch: 1 Global Step: 15510 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:06:39,430-Speed 3394.79 samples/sec Loss 10.2591 LearningRate 0.0879 Epoch: 1 Global Step: 15520 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:06:42,496-Speed 3341.70 samples/sec Loss 10.2976 LearningRate 0.0879 Epoch: 1 Global Step: 15530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:06:45,543-Speed 3360.96 samples/sec Loss 10.2305 LearningRate 0.0879 Epoch: 1 Global Step: 15540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:06:48,714-Speed 3230.60 samples/sec Loss 10.3352 LearningRate 0.0879 Epoch: 1 Global Step: 15550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:06:51,743-Speed 3382.10 samples/sec Loss 10.1730 LearningRate 0.0879 Epoch: 1 Global Step: 15560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:06:54,796-Speed 3354.56 samples/sec Loss 10.3211 LearningRate 0.0879 Epoch: 1 Global Step: 15570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:06:57,789-Speed 3422.94 samples/sec Loss 10.3363 LearningRate 0.0879 Epoch: 1 Global Step: 15580 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:07:00,851-Speed 3344.71 samples/sec Loss 10.3006 LearningRate 0.0878 Epoch: 1 Global Step: 15590 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:07:03,932-Speed 3324.89 samples/sec Loss 10.1831 LearningRate 0.0878 Epoch: 1 Global Step: 15600 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:07:07,008-Speed 3330.25 samples/sec Loss 10.2971 LearningRate 0.0878 Epoch: 1 Global Step: 15610 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:07:10,016-Speed 3404.83 samples/sec Loss 10.1859 LearningRate 0.0878 Epoch: 1 Global Step: 15620 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:07:13,083-Speed 3339.91 samples/sec Loss 10.2661 LearningRate 0.0878 Epoch: 1 Global Step: 15630 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:07:16,156-Speed 3334.29 samples/sec Loss 10.2641 LearningRate 0.0878 Epoch: 1 Global Step: 15640 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:07:19,169-Speed 3398.88 samples/sec Loss 10.3933 LearningRate 0.0878 Epoch: 1 Global Step: 15650 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:07:22,177-Speed 3405.17 samples/sec Loss 10.2045 LearningRate 0.0878 Epoch: 1 Global Step: 15660 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:07:25,189-Speed 3400.90 samples/sec Loss 10.1289 LearningRate 0.0878 Epoch: 1 Global Step: 15670 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:07:28,254-Speed 3342.32 samples/sec Loss 10.3381 LearningRate 0.0878 Epoch: 1 Global Step: 15680 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:07:31,286-Speed 3378.93 samples/sec Loss 10.1024 LearningRate 0.0878 Epoch: 1 Global Step: 15690 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:07:34,347-Speed 3345.86 samples/sec Loss 10.2080 LearningRate 0.0878 Epoch: 1 Global Step: 15700 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:07:37,373-Speed 3385.49 samples/sec Loss 10.2565 LearningRate 0.0878 Epoch: 1 Global Step: 15710 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:07:40,400-Speed 3384.04 samples/sec Loss 10.3189 LearningRate 0.0877 Epoch: 1 Global Step: 15720 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:07:43,409-Speed 3403.56 samples/sec Loss 10.1642 LearningRate 0.0877 Epoch: 1 Global Step: 15730 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:07:46,410-Speed 3413.24 samples/sec Loss 10.2109 LearningRate 0.0877 Epoch: 1 Global Step: 15740 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:07:49,478-Speed 3339.18 samples/sec Loss 10.1041 LearningRate 0.0877 Epoch: 1 Global Step: 15750 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:07:52,613-Speed 3267.55 samples/sec Loss 10.3169 LearningRate 0.0877 Epoch: 1 Global Step: 15760 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:07:55,643-Speed 3380.86 samples/sec Loss 10.1156 LearningRate 0.0877 Epoch: 1 Global Step: 15770 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:07:58,653-Speed 3402.40 samples/sec Loss 10.0387 LearningRate 0.0877 Epoch: 1 Global Step: 15780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:08:01,735-Speed 3323.78 samples/sec Loss 10.2053 LearningRate 0.0877 Epoch: 1 Global Step: 15790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:08:04,750-Speed 3397.91 samples/sec Loss 10.2614 LearningRate 0.0877 Epoch: 1 Global Step: 15800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:08:07,787-Speed 3372.88 samples/sec Loss 10.2569 LearningRate 0.0877 Epoch: 1 Global Step: 15810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:08:10,805-Speed 3393.27 samples/sec Loss 10.2902 LearningRate 0.0877 Epoch: 1 Global Step: 15820 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:08:13,867-Speed 3346.22 samples/sec Loss 10.1049 LearningRate 0.0877 Epoch: 1 Global Step: 15830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:08:16,866-Speed 3415.57 samples/sec Loss 10.2507 LearningRate 0.0877 Epoch: 1 Global Step: 15840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:08:19,881-Speed 3396.50 samples/sec Loss 10.3121 LearningRate 0.0876 Epoch: 1 Global Step: 15850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:08:22,897-Speed 3396.48 samples/sec Loss 10.2625 LearningRate 0.0876 Epoch: 1 Global Step: 15860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:08:25,912-Speed 3397.33 samples/sec Loss 10.1614 LearningRate 0.0876 Epoch: 1 Global Step: 15870 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:08:29,017-Speed 3299.04 samples/sec Loss 10.3668 LearningRate 0.0876 Epoch: 1 Global Step: 15880 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:08:32,052-Speed 3375.45 samples/sec Loss 10.0989 LearningRate 0.0876 Epoch: 1 Global Step: 15890 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:08:35,057-Speed 3408.56 samples/sec Loss 10.1582 LearningRate 0.0876 Epoch: 1 Global Step: 15900 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:08:38,065-Speed 3405.60 samples/sec Loss 10.4122 LearningRate 0.0876 Epoch: 1 Global Step: 15910 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:08:41,095-Speed 3379.88 samples/sec Loss 10.2916 LearningRate 0.0876 Epoch: 1 Global Step: 15920 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:08:44,137-Speed 3368.15 samples/sec Loss 10.2205 LearningRate 0.0876 Epoch: 1 Global Step: 15930 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:08:47,186-Speed 3358.69 samples/sec Loss 10.1996 LearningRate 0.0876 Epoch: 1 Global Step: 15940 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:08:50,197-Speed 3402.57 samples/sec Loss 10.2098 LearningRate 0.0876 Epoch: 1 Global Step: 15950 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:08:53,199-Speed 3411.79 samples/sec Loss 10.3190 LearningRate 0.0876 Epoch: 1 Global Step: 15960 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:08:56,242-Speed 3366.61 samples/sec Loss 10.3057 LearningRate 0.0876 Epoch: 1 Global Step: 15970 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:08:59,292-Speed 3358.91 samples/sec Loss 10.2055 LearningRate 0.0875 Epoch: 1 Global Step: 15980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:02,317-Speed 3386.58 samples/sec Loss 10.3268 LearningRate 0.0875 Epoch: 1 Global Step: 15990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:05,350-Speed 3376.29 samples/sec Loss 10.2637 LearningRate 0.0875 Epoch: 1 Global Step: 16000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:08,368-Speed 3394.90 samples/sec Loss 10.1933 LearningRate 0.0875 Epoch: 1 Global Step: 16010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:11,418-Speed 3358.16 samples/sec Loss 10.3167 LearningRate 0.0875 Epoch: 1 Global Step: 16020 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:14,440-Speed 3389.58 samples/sec Loss 10.3710 LearningRate 0.0875 Epoch: 1 Global Step: 16030 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:17,485-Speed 3364.42 samples/sec Loss 10.3780 LearningRate 0.0875 Epoch: 1 Global Step: 16040 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:20,486-Speed 3412.83 samples/sec Loss 10.2568 LearningRate 0.0875 Epoch: 1 Global Step: 16050 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:23,497-Speed 3402.36 samples/sec Loss 10.3349 LearningRate 0.0875 Epoch: 1 Global Step: 16060 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:26,570-Speed 3332.89 samples/sec Loss 10.2384 LearningRate 0.0875 Epoch: 1 Global Step: 16070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:09:29,607-Speed 3372.84 samples/sec Loss 10.2774 LearningRate 0.0875 Epoch: 1 Global Step: 16080 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:32,691-Speed 3321.02 samples/sec Loss 10.2271 LearningRate 0.0875 Epoch: 1 Global Step: 16090 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:35,734-Speed 3366.30 samples/sec Loss 10.2352 LearningRate 0.0875 Epoch: 1 Global Step: 16100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:38,790-Speed 3352.48 samples/sec Loss 10.1782 LearningRate 0.0875 Epoch: 1 Global Step: 16110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:41,808-Speed 3393.89 samples/sec Loss 10.2330 LearningRate 0.0874 Epoch: 1 Global Step: 16120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:44,841-Speed 3377.41 samples/sec Loss 10.1738 LearningRate 0.0874 Epoch: 1 Global Step: 16130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:47,886-Speed 3363.67 samples/sec Loss 10.2105 LearningRate 0.0874 Epoch: 1 Global Step: 16140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:50,884-Speed 3416.35 samples/sec Loss 10.2745 LearningRate 0.0874 Epoch: 1 Global Step: 16150 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:53,913-Speed 3382.26 samples/sec Loss 10.2114 LearningRate 0.0874 Epoch: 1 Global Step: 16160 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:56,916-Speed 3410.37 samples/sec Loss 10.1402 LearningRate 0.0874 Epoch: 1 Global Step: 16170 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:09:59,933-Speed 3395.91 samples/sec Loss 10.3800 LearningRate 0.0874 Epoch: 1 Global Step: 16180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:10:02,996-Speed 3343.77 samples/sec Loss 10.0298 LearningRate 0.0874 Epoch: 1 Global Step: 16190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:10:06,071-Speed 3331.48 samples/sec Loss 10.2791 LearningRate 0.0874 Epoch: 1 Global Step: 16200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:10:09,084-Speed 3399.53 samples/sec Loss 10.2183 LearningRate 0.0874 Epoch: 1 Global Step: 16210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:10:12,098-Speed 3398.46 samples/sec Loss 10.2767 LearningRate 0.0874 Epoch: 1 Global Step: 16220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:10:15,131-Speed 3377.53 samples/sec Loss 10.1965 LearningRate 0.0874 Epoch: 1 Global Step: 16230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:10:18,196-Speed 3341.57 samples/sec Loss 10.2425 LearningRate 0.0874 Epoch: 1 Global Step: 16240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:10:21,241-Speed 3364.47 samples/sec Loss 10.3411 LearningRate 0.0873 Epoch: 1 Global Step: 16250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:10:24,273-Speed 3378.13 samples/sec Loss 10.2384 LearningRate 0.0873 Epoch: 1 Global Step: 16260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:10:27,302-Speed 3382.13 samples/sec Loss 10.4143 LearningRate 0.0873 Epoch: 1 Global Step: 16270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:10:30,394-Speed 3312.55 samples/sec Loss 10.2856 LearningRate 0.0873 Epoch: 1 Global Step: 16280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:10:33,444-Speed 3358.39 samples/sec Loss 10.2315 LearningRate 0.0873 Epoch: 1 Global Step: 16290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:10:36,480-Speed 3374.48 samples/sec Loss 10.1894 LearningRate 0.0873 Epoch: 1 Global Step: 16300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:10:39,499-Speed 3392.39 samples/sec Loss 10.1750 LearningRate 0.0873 Epoch: 1 Global Step: 16310 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:10:42,613-Speed 3289.40 samples/sec Loss 10.1946 LearningRate 0.0873 Epoch: 1 Global Step: 16320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:10:45,662-Speed 3359.91 samples/sec Loss 10.2631 LearningRate 0.0873 Epoch: 1 Global Step: 16330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:10:48,790-Speed 3273.75 samples/sec Loss 10.1608 LearningRate 0.0873 Epoch: 1 Global Step: 16340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:10:51,824-Speed 3376.25 samples/sec Loss 10.3667 LearningRate 0.0873 Epoch: 1 Global Step: 16350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:10:54,835-Speed 3401.68 samples/sec Loss 10.2183 LearningRate 0.0873 Epoch: 1 Global Step: 16360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:10:57,841-Speed 3407.85 samples/sec Loss 10.1603 LearningRate 0.0873 Epoch: 1 Global Step: 16370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:11:00,878-Speed 3372.92 samples/sec Loss 10.1680 LearningRate 0.0872 Epoch: 1 Global Step: 16380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:11:03,939-Speed 3346.31 samples/sec Loss 10.1815 LearningRate 0.0872 Epoch: 1 Global Step: 16390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:11:06,975-Speed 3374.05 samples/sec Loss 10.0545 LearningRate 0.0872 Epoch: 1 Global Step: 16400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:09,997-Speed 3390.44 samples/sec Loss 10.1089 LearningRate 0.0872 Epoch: 1 Global Step: 16410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:13,024-Speed 3383.21 samples/sec Loss 10.2090 LearningRate 0.0872 Epoch: 1 Global Step: 16420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:16,116-Speed 3312.96 samples/sec Loss 10.1896 LearningRate 0.0872 Epoch: 1 Global Step: 16430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:19,147-Speed 3379.41 samples/sec Loss 10.2167 LearningRate 0.0872 Epoch: 1 Global Step: 16440 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:22,174-Speed 3384.32 samples/sec Loss 10.0794 LearningRate 0.0872 Epoch: 1 Global Step: 16450 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:25,269-Speed 3309.30 samples/sec Loss 10.0740 LearningRate 0.0872 Epoch: 1 Global Step: 16460 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:28,320-Speed 3356.75 samples/sec Loss 10.1772 LearningRate 0.0872 Epoch: 1 Global Step: 16470 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:31,404-Speed 3321.54 samples/sec Loss 10.0980 LearningRate 0.0872 Epoch: 1 Global Step: 16480 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:34,406-Speed 3412.79 samples/sec Loss 10.0872 LearningRate 0.0872 Epoch: 1 Global Step: 16490 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:37,425-Speed 3393.12 samples/sec Loss 10.1119 LearningRate 0.0872 Epoch: 1 Global Step: 16500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:11:40,440-Speed 3396.97 samples/sec Loss 10.2606 LearningRate 0.0871 Epoch: 1 Global Step: 16510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:11:43,434-Speed 3421.20 samples/sec Loss 10.0382 LearningRate 0.0871 Epoch: 1 Global Step: 16520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:46,459-Speed 3387.23 samples/sec Loss 10.2496 LearningRate 0.0871 Epoch: 1 Global Step: 16530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:49,647-Speed 3212.35 samples/sec Loss 10.1857 LearningRate 0.0871 Epoch: 1 Global Step: 16540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:52,658-Speed 3402.03 samples/sec Loss 10.1537 LearningRate 0.0871 Epoch: 1 Global Step: 16550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:55,681-Speed 3389.02 samples/sec Loss 10.1564 LearningRate 0.0871 Epoch: 1 Global Step: 16560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:11:58,680-Speed 3416.04 samples/sec Loss 10.1784 LearningRate 0.0871 Epoch: 1 Global Step: 16570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:12:01,679-Speed 3414.39 samples/sec Loss 10.1423 LearningRate 0.0871 Epoch: 1 Global Step: 16580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:12:04,739-Speed 3347.77 samples/sec Loss 10.1887 LearningRate 0.0871 Epoch: 1 Global Step: 16590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:12:07,740-Speed 3414.13 samples/sec Loss 10.2438 LearningRate 0.0871 Epoch: 1 Global Step: 16600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:12:10,792-Speed 3355.91 samples/sec Loss 10.2063 LearningRate 0.0871 Epoch: 1 Global Step: 16610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:12:13,867-Speed 3331.63 samples/sec Loss 10.1636 LearningRate 0.0871 Epoch: 1 Global Step: 16620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:12:16,899-Speed 3378.00 samples/sec Loss 10.1860 LearningRate 0.0871 Epoch: 1 Global Step: 16630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:12:19,905-Speed 3407.52 samples/sec Loss 9.9315 LearningRate 0.0871 Epoch: 1 Global Step: 16640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:12:22,929-Speed 3389.00 samples/sec Loss 10.1416 LearningRate 0.0870 Epoch: 1 Global Step: 16650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:12:26,005-Speed 3329.80 samples/sec Loss 10.1254 LearningRate 0.0870 Epoch: 1 Global Step: 16660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:12:29,011-Speed 3407.83 samples/sec Loss 10.1142 LearningRate 0.0870 Epoch: 1 Global Step: 16670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:12:32,029-Speed 3394.53 samples/sec Loss 10.2145 LearningRate 0.0870 Epoch: 1 Global Step: 16680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:12:35,042-Speed 3399.34 samples/sec Loss 10.2189 LearningRate 0.0870 Epoch: 1 Global Step: 16690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:12:38,066-Speed 3387.52 samples/sec Loss 10.2548 LearningRate 0.0870 Epoch: 1 Global Step: 16700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:12:41,064-Speed 3416.90 samples/sec Loss 10.1511 LearningRate 0.0870 Epoch: 1 Global Step: 16710 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:12:44,071-Speed 3406.43 samples/sec Loss 10.2510 LearningRate 0.0870 Epoch: 1 Global Step: 16720 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:12:47,088-Speed 3395.68 samples/sec Loss 10.1472 LearningRate 0.0870 Epoch: 1 Global Step: 16730 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:12:50,091-Speed 3411.21 samples/sec Loss 10.0625 LearningRate 0.0870 Epoch: 1 Global Step: 16740 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:12:53,127-Speed 3372.84 samples/sec Loss 10.0392 LearningRate 0.0870 Epoch: 1 Global Step: 16750 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:12:56,181-Speed 3354.97 samples/sec Loss 9.9947 LearningRate 0.0870 Epoch: 1 Global Step: 16760 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:12:59,178-Speed 3417.23 samples/sec Loss 10.2434 LearningRate 0.0870 Epoch: 1 Global Step: 16770 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:02,234-Speed 3352.76 samples/sec Loss 10.1393 LearningRate 0.0869 Epoch: 1 Global Step: 16780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:05,233-Speed 3415.14 samples/sec Loss 10.2316 LearningRate 0.0869 Epoch: 1 Global Step: 16790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:08,243-Speed 3402.62 samples/sec Loss 10.1629 LearningRate 0.0869 Epoch: 1 Global Step: 16800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:11,309-Speed 3341.16 samples/sec Loss 10.0271 LearningRate 0.0869 Epoch: 1 Global Step: 16810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:13:14,398-Speed 3316.14 samples/sec Loss 10.2524 LearningRate 0.0869 Epoch: 1 Global Step: 16820 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:17,404-Speed 3408.28 samples/sec Loss 10.2111 LearningRate 0.0869 Epoch: 1 Global Step: 16830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:20,433-Speed 3381.99 samples/sec Loss 10.1042 LearningRate 0.0869 Epoch: 1 Global Step: 16840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:23,475-Speed 3367.04 samples/sec Loss 10.1821 LearningRate 0.0869 Epoch: 1 Global Step: 16850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:26,570-Speed 3309.89 samples/sec Loss 10.0361 LearningRate 0.0869 Epoch: 1 Global Step: 16860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:29,618-Speed 3359.51 samples/sec Loss 10.2556 LearningRate 0.0869 Epoch: 1 Global Step: 16870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:32,649-Speed 3380.53 samples/sec Loss 10.1263 LearningRate 0.0869 Epoch: 1 Global Step: 16880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:35,674-Speed 3385.74 samples/sec Loss 10.2395 LearningRate 0.0869 Epoch: 1 Global Step: 16890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:38,706-Speed 3378.58 samples/sec Loss 10.1644 LearningRate 0.0869 Epoch: 1 Global Step: 16900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:41,718-Speed 3400.99 samples/sec Loss 10.1823 LearningRate 0.0868 Epoch: 1 Global Step: 16910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:13:44,758-Speed 3369.06 samples/sec Loss 10.2699 LearningRate 0.0868 Epoch: 1 Global Step: 16920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:13:47,791-Speed 3377.81 samples/sec Loss 9.9648 LearningRate 0.0868 Epoch: 1 Global Step: 16930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:13:50,893-Speed 3301.98 samples/sec Loss 10.3175 LearningRate 0.0868 Epoch: 1 Global Step: 16940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:13:53,974-Speed 3324.68 samples/sec Loss 10.2144 LearningRate 0.0868 Epoch: 1 Global Step: 16950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:13:56,972-Speed 3417.54 samples/sec Loss 10.2591 LearningRate 0.0868 Epoch: 1 Global Step: 16960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:14:00,011-Speed 3370.62 samples/sec Loss 10.1349 LearningRate 0.0868 Epoch: 1 Global Step: 16970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:14:03,022-Speed 3401.79 samples/sec Loss 9.9934 LearningRate 0.0868 Epoch: 1 Global Step: 16980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:14:06,083-Speed 3346.91 samples/sec Loss 10.1389 LearningRate 0.0868 Epoch: 1 Global Step: 16990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:14:09,075-Speed 3422.80 samples/sec Loss 10.0603 LearningRate 0.0868 Epoch: 1 Global Step: 17000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:14:12,121-Speed 3362.94 samples/sec Loss 10.1678 LearningRate 0.0868 Epoch: 1 Global Step: 17010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:14:15,189-Speed 3339.10 samples/sec Loss 10.1441 LearningRate 0.0868 Epoch: 1 Global Step: 17020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 03:14:18,229-Speed 3369.86 samples/sec Loss 10.2337 LearningRate 0.0868 Epoch: 1 Global Step: 17030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:14:21,269-Speed 3369.05 samples/sec Loss 10.1044 LearningRate 0.0868 Epoch: 1 Global Step: 17040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:14:24,324-Speed 3353.08 samples/sec Loss 10.2387 LearningRate 0.0867 Epoch: 1 Global Step: 17050 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:14:27,362-Speed 3371.16 samples/sec Loss 10.0375 LearningRate 0.0867 Epoch: 1 Global Step: 17060 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:14:30,466-Speed 3301.01 samples/sec Loss 10.0855 LearningRate 0.0867 Epoch: 1 Global Step: 17070 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:14:33,459-Speed 3421.91 samples/sec Loss 10.1725 LearningRate 0.0867 Epoch: 1 Global Step: 17080 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:14:36,508-Speed 3359.77 samples/sec Loss 10.1870 LearningRate 0.0867 Epoch: 1 Global Step: 17090 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:14:39,573-Speed 3341.99 samples/sec Loss 10.0379 LearningRate 0.0867 Epoch: 1 Global Step: 17100 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:14:42,608-Speed 3375.32 samples/sec Loss 10.2953 LearningRate 0.0867 Epoch: 1 Global Step: 17110 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:14:45,631-Speed 3388.42 samples/sec Loss 10.1755 LearningRate 0.0867 Epoch: 1 Global Step: 17120 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:14:48,709-Speed 3328.37 samples/sec Loss 10.2428 LearningRate 0.0867 Epoch: 1 Global Step: 17130 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:14:51,761-Speed 3356.26 samples/sec Loss 9.9815 LearningRate 0.0867 Epoch: 1 Global Step: 17140 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:14:54,783-Speed 3390.19 samples/sec Loss 10.1877 LearningRate 0.0867 Epoch: 1 Global Step: 17150 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:14:57,796-Speed 3398.97 samples/sec Loss 10.1217 LearningRate 0.0867 Epoch: 1 Global Step: 17160 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:00,846-Speed 3359.03 samples/sec Loss 10.2963 LearningRate 0.0867 Epoch: 1 Global Step: 17170 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:04,016-Speed 3230.79 samples/sec Loss 10.1578 LearningRate 0.0866 Epoch: 1 Global Step: 17180 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:07,131-Speed 3288.76 samples/sec Loss 10.2161 LearningRate 0.0866 Epoch: 1 Global Step: 17190 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:10,194-Speed 3343.97 samples/sec Loss 10.0435 LearningRate 0.0866 Epoch: 1 Global Step: 17200 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:13,273-Speed 3327.61 samples/sec Loss 10.1979 LearningRate 0.0866 Epoch: 1 Global Step: 17210 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:16,320-Speed 3362.01 samples/sec Loss 10.2494 LearningRate 0.0866 Epoch: 1 Global Step: 17220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:19,390-Speed 3335.70 samples/sec Loss 10.1030 LearningRate 0.0866 Epoch: 1 Global Step: 17230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:22,452-Speed 3345.87 samples/sec Loss 10.0727 LearningRate 0.0866 Epoch: 1 Global Step: 17240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:25,536-Speed 3320.84 samples/sec Loss 10.1156 LearningRate 0.0866 Epoch: 1 Global Step: 17250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:28,596-Speed 3347.84 samples/sec Loss 10.1823 LearningRate 0.0866 Epoch: 1 Global Step: 17260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:15:31,682-Speed 3319.61 samples/sec Loss 10.1097 LearningRate 0.0866 Epoch: 1 Global Step: 17270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:15:34,756-Speed 3332.47 samples/sec Loss 10.0852 LearningRate 0.0866 Epoch: 1 Global Step: 17280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:15:37,817-Speed 3345.57 samples/sec Loss 10.1215 LearningRate 0.0866 Epoch: 1 Global Step: 17290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:15:40,922-Speed 3299.24 samples/sec Loss 10.2565 LearningRate 0.0866 Epoch: 1 Global Step: 17300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:15:44,001-Speed 3327.09 samples/sec Loss 10.1275 LearningRate 0.0865 Epoch: 1 Global Step: 17310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:15:47,040-Speed 3370.85 samples/sec Loss 10.0248 LearningRate 0.0865 Epoch: 1 Global Step: 17320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:15:50,063-Speed 3388.78 samples/sec Loss 10.0777 LearningRate 0.0865 Epoch: 1 Global Step: 17330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:15:53,133-Speed 3336.31 samples/sec Loss 10.1636 LearningRate 0.0865 Epoch: 1 Global Step: 17340 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:56,149-Speed 3396.14 samples/sec Loss 10.0974 LearningRate 0.0865 Epoch: 1 Global Step: 17350 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:15:59,198-Speed 3359.64 samples/sec Loss 10.0220 LearningRate 0.0865 Epoch: 1 Global Step: 17360 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:16:02,249-Speed 3357.25 samples/sec Loss 10.0676 LearningRate 0.0865 Epoch: 1 Global Step: 17370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:16:05,366-Speed 3286.45 samples/sec Loss 10.0001 LearningRate 0.0865 Epoch: 1 Global Step: 17380 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:16:08,402-Speed 3374.10 samples/sec Loss 10.1135 LearningRate 0.0865 Epoch: 1 Global Step: 17390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:16:11,404-Speed 3411.24 samples/sec Loss 10.0485 LearningRate 0.0865 Epoch: 1 Global Step: 17400 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:16:14,454-Speed 3358.74 samples/sec Loss 10.1104 LearningRate 0.0865 Epoch: 1 Global Step: 17410 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:16:17,491-Speed 3372.95 samples/sec Loss 10.0434 LearningRate 0.0865 Epoch: 1 Global Step: 17420 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:16:20,557-Speed 3340.58 samples/sec Loss 10.1177 LearningRate 0.0865 Epoch: 1 Global Step: 17430 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:16:23,578-Speed 3391.43 samples/sec Loss 10.3234 LearningRate 0.0865 Epoch: 1 Global Step: 17440 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:16:26,712-Speed 3268.15 samples/sec Loss 10.1068 LearningRate 0.0864 Epoch: 1 Global Step: 17450 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:16:29,727-Speed 3396.83 samples/sec Loss 10.1340 LearningRate 0.0864 Epoch: 1 Global Step: 17460 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:16:32,755-Speed 3383.21 samples/sec Loss 10.1564 LearningRate 0.0864 Epoch: 1 Global Step: 17470 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:16:35,766-Speed 3402.56 samples/sec Loss 9.9787 LearningRate 0.0864 Epoch: 1 Global Step: 17480 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:16:38,778-Speed 3400.88 samples/sec Loss 10.1254 LearningRate 0.0864 Epoch: 1 Global Step: 17490 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:16:41,786-Speed 3405.04 samples/sec Loss 10.1153 LearningRate 0.0864 Epoch: 1 Global Step: 17500 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:16:44,800-Speed 3398.78 samples/sec Loss 10.1368 LearningRate 0.0864 Epoch: 1 Global Step: 17510 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:16:47,880-Speed 3325.44 samples/sec Loss 10.2076 LearningRate 0.0864 Epoch: 1 Global Step: 17520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:16:50,920-Speed 3369.59 samples/sec Loss 10.1378 LearningRate 0.0864 Epoch: 1 Global Step: 17530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:16:53,989-Speed 3337.49 samples/sec Loss 10.2293 LearningRate 0.0864 Epoch: 1 Global Step: 17540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:16:57,034-Speed 3364.17 samples/sec Loss 10.1363 LearningRate 0.0864 Epoch: 1 Global Step: 17550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:17:00,038-Speed 3409.88 samples/sec Loss 10.1949 LearningRate 0.0864 Epoch: 1 Global Step: 17560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:17:03,101-Speed 3344.83 samples/sec Loss 10.0561 LearningRate 0.0864 Epoch: 1 Global Step: 17570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:17:06,119-Speed 3393.74 samples/sec Loss 10.1149 LearningRate 0.0863 Epoch: 1 Global Step: 17580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:17:09,122-Speed 3411.15 samples/sec Loss 10.1137 LearningRate 0.0863 Epoch: 1 Global Step: 17590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:17:12,150-Speed 3382.86 samples/sec Loss 9.9956 LearningRate 0.0863 Epoch: 1 Global Step: 17600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:17:15,150-Speed 3413.28 samples/sec Loss 9.9887 LearningRate 0.0863 Epoch: 1 Global Step: 17610 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:17:18,228-Speed 3328.40 samples/sec Loss 10.0884 LearningRate 0.0863 Epoch: 1 Global Step: 17620 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:17:21,251-Speed 3387.85 samples/sec Loss 10.1135 LearningRate 0.0863 Epoch: 1 Global Step: 17630 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:17:24,316-Speed 3342.70 samples/sec Loss 9.9540 LearningRate 0.0863 Epoch: 1 Global Step: 17640 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:17:27,334-Speed 3393.97 samples/sec Loss 10.1263 LearningRate 0.0863 Epoch: 1 Global Step: 17650 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:17:30,392-Speed 3349.35 samples/sec Loss 10.1259 LearningRate 0.0863 Epoch: 1 Global Step: 17660 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:17:33,412-Speed 3392.36 samples/sec Loss 10.0707 LearningRate 0.0863 Epoch: 1 Global Step: 17670 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:17:36,431-Speed 3392.89 samples/sec Loss 9.9566 LearningRate 0.0863 Epoch: 1 Global Step: 17680 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:17:39,433-Speed 3411.67 samples/sec Loss 10.0962 LearningRate 0.0863 Epoch: 1 Global Step: 17690 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:17:42,470-Speed 3372.85 samples/sec Loss 10.1136 LearningRate 0.0863 Epoch: 1 Global Step: 17700 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:17:45,480-Speed 3403.95 samples/sec Loss 9.9566 LearningRate 0.0863 Epoch: 1 Global Step: 17710 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:17:48,472-Speed 3422.89 samples/sec Loss 10.0330 LearningRate 0.0862 Epoch: 1 Global Step: 17720 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:17:51,534-Speed 3345.45 samples/sec Loss 10.1347 LearningRate 0.0862 Epoch: 1 Global Step: 17730 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:17:54,565-Speed 3379.96 samples/sec Loss 10.1062 LearningRate 0.0862 Epoch: 1 Global Step: 17740 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:17:57,586-Speed 3390.31 samples/sec Loss 9.9686 LearningRate 0.0862 Epoch: 1 Global Step: 17750 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:18:00,583-Speed 3418.45 samples/sec Loss 9.9944 LearningRate 0.0862 Epoch: 1 Global Step: 17760 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:18:03,625-Speed 3367.24 samples/sec Loss 10.1130 LearningRate 0.0862 Epoch: 1 Global Step: 17770 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:18:06,663-Speed 3371.05 samples/sec Loss 10.1266 LearningRate 0.0862 Epoch: 1 Global Step: 17780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:18:09,647-Speed 3432.82 samples/sec Loss 10.1633 LearningRate 0.0862 Epoch: 1 Global Step: 17790 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:18:12,680-Speed 3377.98 samples/sec Loss 10.2494 LearningRate 0.0862 Epoch: 1 Global Step: 17800 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:18:15,755-Speed 3331.13 samples/sec Loss 10.0498 LearningRate 0.0862 Epoch: 1 Global Step: 17810 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:18:18,809-Speed 3353.32 samples/sec Loss 9.8818 LearningRate 0.0862 Epoch: 1 Global Step: 17820 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:18:21,811-Speed 3412.47 samples/sec Loss 10.0440 LearningRate 0.0862 Epoch: 1 Global Step: 17830 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:18:24,861-Speed 3358.44 samples/sec Loss 10.0072 LearningRate 0.0862 Epoch: 1 Global Step: 17840 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:18:27,891-Speed 3380.83 samples/sec Loss 10.1235 LearningRate 0.0861 Epoch: 1 Global Step: 17850 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:18:30,967-Speed 3330.24 samples/sec Loss 10.0564 LearningRate 0.0861 Epoch: 1 Global Step: 17860 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:18:34,025-Speed 3349.73 samples/sec Loss 10.0060 LearningRate 0.0861 Epoch: 1 Global Step: 17870 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:18:37,090-Speed 3341.76 samples/sec Loss 10.0399 LearningRate 0.0861 Epoch: 1 Global Step: 17880 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:18:40,151-Speed 3346.02 samples/sec Loss 10.0216 LearningRate 0.0861 Epoch: 1 Global Step: 17890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:18:43,193-Speed 3367.73 samples/sec Loss 10.0870 LearningRate 0.0861 Epoch: 1 Global Step: 17900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:18:46,251-Speed 3349.65 samples/sec Loss 10.0556 LearningRate 0.0861 Epoch: 1 Global Step: 17910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:18:49,344-Speed 3311.76 samples/sec Loss 9.9864 LearningRate 0.0861 Epoch: 1 Global Step: 17920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:18:52,396-Speed 3356.54 samples/sec Loss 10.0052 LearningRate 0.0861 Epoch: 1 Global Step: 17930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:18:55,425-Speed 3381.92 samples/sec Loss 10.0760 LearningRate 0.0861 Epoch: 1 Global Step: 17940 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:18:58,415-Speed 3425.13 samples/sec Loss 10.0788 LearningRate 0.0861 Epoch: 1 Global Step: 17950 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:19:01,465-Speed 3358.62 samples/sec Loss 9.9897 LearningRate 0.0861 Epoch: 1 Global Step: 17960 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:19:04,484-Speed 3392.96 samples/sec Loss 10.1063 LearningRate 0.0861 Epoch: 1 Global Step: 17970 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:19:07,544-Speed 3348.38 samples/sec Loss 9.9881 LearningRate 0.0860 Epoch: 1 Global Step: 17980 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:19:10,549-Speed 3408.51 samples/sec Loss 9.9067 LearningRate 0.0860 Epoch: 1 Global Step: 17990 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:19:13,613-Speed 3342.63 samples/sec Loss 10.0180 LearningRate 0.0860 Epoch: 1 Global Step: 18000 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:19:16,655-Speed 3367.93 samples/sec Loss 10.0647 LearningRate 0.0860 Epoch: 1 Global Step: 18010 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:19:19,651-Speed 3418.70 samples/sec Loss 9.9634 LearningRate 0.0860 Epoch: 1 Global Step: 18020 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:19:22,667-Speed 3396.36 samples/sec Loss 10.0078 LearningRate 0.0860 Epoch: 1 Global Step: 18030 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:19:25,680-Speed 3398.89 samples/sec Loss 10.0720 LearningRate 0.0860 Epoch: 1 Global Step: 18040 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:19:28,676-Speed 3420.12 samples/sec Loss 10.0371 LearningRate 0.0860 Epoch: 1 Global Step: 18050 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:19:31,687-Speed 3400.89 samples/sec Loss 9.9894 LearningRate 0.0860 Epoch: 1 Global Step: 18060 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:19:34,747-Speed 3348.22 samples/sec Loss 10.0576 LearningRate 0.0860 Epoch: 1 Global Step: 18070 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:19:37,819-Speed 3333.90 samples/sec Loss 10.0521 LearningRate 0.0860 Epoch: 1 Global Step: 18080 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:19:40,847-Speed 3383.59 samples/sec Loss 10.0149 LearningRate 0.0860 Epoch: 1 Global Step: 18090 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:19:43,905-Speed 3349.10 samples/sec Loss 9.9215 LearningRate 0.0860 Epoch: 1 Global Step: 18100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:19:46,984-Speed 3327.53 samples/sec Loss 10.1061 LearningRate 0.0860 Epoch: 1 Global Step: 18110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:19:50,029-Speed 3363.77 samples/sec Loss 10.0600 LearningRate 0.0859 Epoch: 1 Global Step: 18120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:19:53,132-Speed 3300.88 samples/sec Loss 10.1057 LearningRate 0.0859 Epoch: 1 Global Step: 18130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:19:56,153-Speed 3390.51 samples/sec Loss 9.9427 LearningRate 0.0859 Epoch: 1 Global Step: 18140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:19:59,220-Speed 3340.19 samples/sec Loss 9.9322 LearningRate 0.0859 Epoch: 1 Global Step: 18150 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:02,319-Speed 3305.05 samples/sec Loss 9.9200 LearningRate 0.0859 Epoch: 1 Global Step: 18160 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:05,329-Speed 3402.87 samples/sec Loss 9.9621 LearningRate 0.0859 Epoch: 1 Global Step: 18170 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:08,394-Speed 3342.10 samples/sec Loss 9.9423 LearningRate 0.0859 Epoch: 1 Global Step: 18180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:20:11,462-Speed 3339.14 samples/sec Loss 9.9244 LearningRate 0.0859 Epoch: 1 Global Step: 18190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:20:14,519-Speed 3350.99 samples/sec Loss 10.0227 LearningRate 0.0859 Epoch: 1 Global Step: 18200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:20:17,611-Speed 3313.09 samples/sec Loss 10.0029 LearningRate 0.0859 Epoch: 1 Global Step: 18210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:20:20,624-Speed 3399.29 samples/sec Loss 9.8568 LearningRate 0.0859 Epoch: 1 Global Step: 18220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:20:23,643-Speed 3393.09 samples/sec Loss 9.9295 LearningRate 0.0859 Epoch: 1 Global Step: 18230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:26,671-Speed 3382.81 samples/sec Loss 10.0609 LearningRate 0.0859 Epoch: 1 Global Step: 18240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:29,774-Speed 3301.47 samples/sec Loss 10.1152 LearningRate 0.0858 Epoch: 1 Global Step: 18250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:32,811-Speed 3372.50 samples/sec Loss 10.0440 LearningRate 0.0858 Epoch: 1 Global Step: 18260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:35,881-Speed 3336.16 samples/sec Loss 10.0602 LearningRate 0.0858 Epoch: 1 Global Step: 18270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:38,954-Speed 3333.92 samples/sec Loss 9.9329 LearningRate 0.0858 Epoch: 1 Global Step: 18280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:41,980-Speed 3384.90 samples/sec Loss 9.9360 LearningRate 0.0858 Epoch: 1 Global Step: 18290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:45,003-Speed 3389.03 samples/sec Loss 10.0579 LearningRate 0.0858 Epoch: 1 Global Step: 18300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:48,059-Speed 3351.22 samples/sec Loss 9.8425 LearningRate 0.0858 Epoch: 1 Global Step: 18310 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:51,123-Speed 3343.33 samples/sec Loss 9.9733 LearningRate 0.0858 Epoch: 1 Global Step: 18320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:20:54,157-Speed 3376.01 samples/sec Loss 9.9135 LearningRate 0.0858 Epoch: 1 Global Step: 18330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:20:57,190-Speed 3377.53 samples/sec Loss 10.1286 LearningRate 0.0858 Epoch: 1 Global Step: 18340 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:00,227-Speed 3373.80 samples/sec Loss 9.9761 LearningRate 0.0858 Epoch: 1 Global Step: 18350 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:03,330-Speed 3300.41 samples/sec Loss 9.9737 LearningRate 0.0858 Epoch: 1 Global Step: 18360 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:06,452-Speed 3280.78 samples/sec Loss 10.0341 LearningRate 0.0858 Epoch: 1 Global Step: 18370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:09,464-Speed 3400.67 samples/sec Loss 9.9491 LearningRate 0.0857 Epoch: 1 Global Step: 18380 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:12,536-Speed 3335.02 samples/sec Loss 9.8478 LearningRate 0.0857 Epoch: 1 Global Step: 18390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:15,601-Speed 3341.45 samples/sec Loss 9.9802 LearningRate 0.0857 Epoch: 1 Global Step: 18400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:18,658-Speed 3351.55 samples/sec Loss 9.9196 LearningRate 0.0857 Epoch: 1 Global Step: 18410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:21,689-Speed 3379.03 samples/sec Loss 10.0160 LearningRate 0.0857 Epoch: 1 Global Step: 18420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:24,747-Speed 3349.86 samples/sec Loss 10.0355 LearningRate 0.0857 Epoch: 1 Global Step: 18430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:27,763-Speed 3396.53 samples/sec Loss 10.0131 LearningRate 0.0857 Epoch: 1 Global Step: 18440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:21:30,832-Speed 3337.53 samples/sec Loss 10.0332 LearningRate 0.0857 Epoch: 1 Global Step: 18450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:21:33,856-Speed 3386.79 samples/sec Loss 10.0562 LearningRate 0.0857 Epoch: 1 Global Step: 18460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:21:36,878-Speed 3390.21 samples/sec Loss 9.8407 LearningRate 0.0857 Epoch: 1 Global Step: 18470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:21:39,907-Speed 3381.20 samples/sec Loss 10.0058 LearningRate 0.0857 Epoch: 1 Global Step: 18480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:21:42,967-Speed 3347.25 samples/sec Loss 9.9474 LearningRate 0.0857 Epoch: 1 Global Step: 18490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:21:45,997-Speed 3381.40 samples/sec Loss 10.0551 LearningRate 0.0857 Epoch: 1 Global Step: 18500 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:49,016-Speed 3391.87 samples/sec Loss 9.9833 LearningRate 0.0857 Epoch: 1 Global Step: 18510 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:52,069-Speed 3355.59 samples/sec Loss 10.0417 LearningRate 0.0856 Epoch: 1 Global Step: 18520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:55,143-Speed 3332.52 samples/sec Loss 9.9416 LearningRate 0.0856 Epoch: 1 Global Step: 18530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:21:58,224-Speed 3324.24 samples/sec Loss 9.9348 LearningRate 0.0856 Epoch: 1 Global Step: 18540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:22:01,273-Speed 3359.70 samples/sec Loss 10.1019 LearningRate 0.0856 Epoch: 1 Global Step: 18550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:22:04,343-Speed 3336.74 samples/sec Loss 10.0594 LearningRate 0.0856 Epoch: 1 Global Step: 18560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:22:07,354-Speed 3402.80 samples/sec Loss 9.9035 LearningRate 0.0856 Epoch: 1 Global Step: 18570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:22:10,359-Speed 3408.49 samples/sec Loss 9.9072 LearningRate 0.0856 Epoch: 1 Global Step: 18580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:22:13,432-Speed 3332.62 samples/sec Loss 10.1570 LearningRate 0.0856 Epoch: 1 Global Step: 18590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:22:16,489-Speed 3351.29 samples/sec Loss 9.9714 LearningRate 0.0856 Epoch: 1 Global Step: 18600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:22:19,523-Speed 3376.27 samples/sec Loss 9.9254 LearningRate 0.0856 Epoch: 1 Global Step: 18610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:22:22,600-Speed 3329.38 samples/sec Loss 10.0446 LearningRate 0.0856 Epoch: 1 Global Step: 18620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:22:25,762-Speed 3239.52 samples/sec Loss 10.0463 LearningRate 0.0856 Epoch: 1 Global Step: 18630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:22:28,771-Speed 3403.26 samples/sec Loss 9.9387 LearningRate 0.0856 Epoch: 1 Global Step: 18640 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:22:31,827-Speed 3352.42 samples/sec Loss 10.0871 LearningRate 0.0855 Epoch: 1 Global Step: 18650 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:22:34,851-Speed 3387.60 samples/sec Loss 10.0546 LearningRate 0.0855 Epoch: 1 Global Step: 18660 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:22:37,867-Speed 3397.12 samples/sec Loss 10.0488 LearningRate 0.0855 Epoch: 1 Global Step: 18670 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:22:40,913-Speed 3362.67 samples/sec Loss 9.9695 LearningRate 0.0855 Epoch: 1 Global Step: 18680 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:22:43,935-Speed 3389.19 samples/sec Loss 9.9022 LearningRate 0.0855 Epoch: 1 Global Step: 18690 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:22:46,988-Speed 3355.69 samples/sec Loss 9.9957 LearningRate 0.0855 Epoch: 1 Global Step: 18700 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:22:50,057-Speed 3337.09 samples/sec Loss 9.9642 LearningRate 0.0855 Epoch: 1 Global Step: 18710 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:22:53,105-Speed 3360.70 samples/sec Loss 10.0152 LearningRate 0.0855 Epoch: 1 Global Step: 18720 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:22:56,121-Speed 3397.39 samples/sec Loss 9.9912 LearningRate 0.0855 Epoch: 1 Global Step: 18730 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:22:59,150-Speed 3381.73 samples/sec Loss 9.9961 LearningRate 0.0855 Epoch: 1 Global Step: 18740 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:02,214-Speed 3343.64 samples/sec Loss 10.0501 LearningRate 0.0855 Epoch: 1 Global Step: 18750 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:05,255-Speed 3368.04 samples/sec Loss 9.9226 LearningRate 0.0855 Epoch: 1 Global Step: 18760 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:08,306-Speed 3357.37 samples/sec Loss 9.8695 LearningRate 0.0855 Epoch: 1 Global Step: 18770 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:23:11,363-Speed 3350.88 samples/sec Loss 9.9575 LearningRate 0.0855 Epoch: 1 Global Step: 18780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:23:14,443-Speed 3326.45 samples/sec Loss 9.9790 LearningRate 0.0854 Epoch: 1 Global Step: 18790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:23:17,509-Speed 3339.96 samples/sec Loss 9.9603 LearningRate 0.0854 Epoch: 1 Global Step: 18800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:23:20,562-Speed 3355.19 samples/sec Loss 9.8801 LearningRate 0.0854 Epoch: 1 Global Step: 18810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:23:23,639-Speed 3329.94 samples/sec Loss 9.9572 LearningRate 0.0854 Epoch: 1 Global Step: 18820 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:26,738-Speed 3305.34 samples/sec Loss 9.8359 LearningRate 0.0854 Epoch: 1 Global Step: 18830 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:29,783-Speed 3363.72 samples/sec Loss 9.8734 LearningRate 0.0854 Epoch: 1 Global Step: 18840 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:32,839-Speed 3351.23 samples/sec Loss 9.8655 LearningRate 0.0854 Epoch: 1 Global Step: 18850 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:35,874-Speed 3374.96 samples/sec Loss 9.9487 LearningRate 0.0854 Epoch: 1 Global Step: 18860 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:38,904-Speed 3381.24 samples/sec Loss 9.8889 LearningRate 0.0854 Epoch: 1 Global Step: 18870 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:41,925-Speed 3389.91 samples/sec Loss 10.0002 LearningRate 0.0854 Epoch: 1 Global Step: 18880 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:44,962-Speed 3373.28 samples/sec Loss 9.9063 LearningRate 0.0854 Epoch: 1 Global Step: 18890 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:47,976-Speed 3398.06 samples/sec Loss 10.0038 LearningRate 0.0854 Epoch: 1 Global Step: 18900 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:51,065-Speed 3316.02 samples/sec Loss 9.8384 LearningRate 0.0854 Epoch: 1 Global Step: 18910 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-04-27 03:23:54,122-Speed 3351.44 samples/sec Loss 10.0180 LearningRate 0.0853 Epoch: 1 Global Step: 18920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:23:57,154-Speed 3378.41 samples/sec Loss 9.8570 LearningRate 0.0853 Epoch: 1 Global Step: 18930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:00,171-Speed 3394.94 samples/sec Loss 9.8264 LearningRate 0.0853 Epoch: 1 Global Step: 18940 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:03,312-Speed 3260.78 samples/sec Loss 9.8925 LearningRate 0.0853 Epoch: 1 Global Step: 18950 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:06,374-Speed 3345.53 samples/sec Loss 9.9048 LearningRate 0.0853 Epoch: 1 Global Step: 18960 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:09,382-Speed 3405.27 samples/sec Loss 9.9722 LearningRate 0.0853 Epoch: 1 Global Step: 18970 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:13,122-Speed 2738.71 samples/sec Loss 10.0353 LearningRate 0.0853 Epoch: 1 Global Step: 18980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:16,145-Speed 3387.68 samples/sec Loss 9.8756 LearningRate 0.0853 Epoch: 1 Global Step: 18990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:19,238-Speed 3311.75 samples/sec Loss 9.8542 LearningRate 0.0853 Epoch: 1 Global Step: 19000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:22,250-Speed 3401.16 samples/sec Loss 9.7604 LearningRate 0.0853 Epoch: 1 Global Step: 19010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:25,289-Speed 3370.69 samples/sec Loss 9.8783 LearningRate 0.0853 Epoch: 1 Global Step: 19020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:24:28,331-Speed 3367.23 samples/sec Loss 10.0290 LearningRate 0.0853 Epoch: 1 Global Step: 19030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:24:31,421-Speed 3315.07 samples/sec Loss 9.9001 LearningRate 0.0853 Epoch: 1 Global Step: 19040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:24:34,476-Speed 3352.69 samples/sec Loss 9.9511 LearningRate 0.0853 Epoch: 1 Global Step: 19050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:24:37,495-Speed 3393.54 samples/sec Loss 9.9342 LearningRate 0.0852 Epoch: 1 Global Step: 19060 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:40,538-Speed 3365.80 samples/sec Loss 10.0683 LearningRate 0.0852 Epoch: 1 Global Step: 19070 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:43,596-Speed 3349.71 samples/sec Loss 9.8898 LearningRate 0.0852 Epoch: 1 Global Step: 19080 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:46,668-Speed 3334.25 samples/sec Loss 9.9270 LearningRate 0.0852 Epoch: 1 Global Step: 19090 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:49,695-Speed 3383.61 samples/sec Loss 9.9671 LearningRate 0.0852 Epoch: 1 Global Step: 19100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:52,740-Speed 3364.57 samples/sec Loss 9.8891 LearningRate 0.0852 Epoch: 1 Global Step: 19110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:55,780-Speed 3369.24 samples/sec Loss 9.9272 LearningRate 0.0852 Epoch: 1 Global Step: 19120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:24:58,787-Speed 3406.79 samples/sec Loss 9.7440 LearningRate 0.0852 Epoch: 1 Global Step: 19130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:01,834-Speed 3361.72 samples/sec Loss 9.9429 LearningRate 0.0852 Epoch: 1 Global Step: 19140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:04,865-Speed 3379.23 samples/sec Loss 9.9204 LearningRate 0.0852 Epoch: 1 Global Step: 19150 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:07,890-Speed 3386.29 samples/sec Loss 9.8290 LearningRate 0.0852 Epoch: 1 Global Step: 19160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:25:10,929-Speed 3370.33 samples/sec Loss 9.8875 LearningRate 0.0852 Epoch: 1 Global Step: 19170 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:14,035-Speed 3298.79 samples/sec Loss 9.9927 LearningRate 0.0852 Epoch: 1 Global Step: 19180 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:17,116-Speed 3324.01 samples/sec Loss 9.8329 LearningRate 0.0851 Epoch: 1 Global Step: 19190 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:20,191-Speed 3330.96 samples/sec Loss 9.9485 LearningRate 0.0851 Epoch: 1 Global Step: 19200 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:23,221-Speed 3380.59 samples/sec Loss 9.8067 LearningRate 0.0851 Epoch: 1 Global Step: 19210 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:26,268-Speed 3362.29 samples/sec Loss 10.0051 LearningRate 0.0851 Epoch: 1 Global Step: 19220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:29,314-Speed 3363.20 samples/sec Loss 9.8698 LearningRate 0.0851 Epoch: 1 Global Step: 19230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:32,369-Speed 3352.76 samples/sec Loss 9.7963 LearningRate 0.0851 Epoch: 1 Global Step: 19240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:35,413-Speed 3365.27 samples/sec Loss 9.9718 LearningRate 0.0851 Epoch: 1 Global Step: 19250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:38,513-Speed 3304.68 samples/sec Loss 9.8285 LearningRate 0.0851 Epoch: 1 Global Step: 19260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:41,631-Speed 3284.65 samples/sec Loss 9.8284 LearningRate 0.0851 Epoch: 1 Global Step: 19270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:25:44,668-Speed 3372.70 samples/sec Loss 9.8537 LearningRate 0.0851 Epoch: 1 Global Step: 19280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:47,735-Speed 3340.46 samples/sec Loss 9.8322 LearningRate 0.0851 Epoch: 1 Global Step: 19290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:50,847-Speed 3290.90 samples/sec Loss 9.8516 LearningRate 0.0851 Epoch: 1 Global Step: 19300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:53,932-Speed 3319.72 samples/sec Loss 9.8191 LearningRate 0.0851 Epoch: 1 Global Step: 19310 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:25:56,989-Speed 3351.08 samples/sec Loss 9.9994 LearningRate 0.0851 Epoch: 1 Global Step: 19320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:26:00,018-Speed 3381.84 samples/sec Loss 9.8855 LearningRate 0.0850 Epoch: 1 Global Step: 19330 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:26:03,084-Speed 3341.17 samples/sec Loss 9.9254 LearningRate 0.0850 Epoch: 1 Global Step: 19340 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:26:06,158-Speed 3331.84 samples/sec Loss 9.8325 LearningRate 0.0850 Epoch: 1 Global Step: 19350 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:26:09,162-Speed 3409.77 samples/sec Loss 9.8702 LearningRate 0.0850 Epoch: 1 Global Step: 19360 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:26:12,197-Speed 3374.78 samples/sec Loss 9.7409 LearningRate 0.0850 Epoch: 1 Global Step: 19370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:26:15,259-Speed 3345.43 samples/sec Loss 9.8127 LearningRate 0.0850 Epoch: 1 Global Step: 19380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:26:18,331-Speed 3334.83 samples/sec Loss 9.8615 LearningRate 0.0850 Epoch: 1 Global Step: 19390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:26:21,379-Speed 3360.75 samples/sec Loss 9.9673 LearningRate 0.0850 Epoch: 1 Global Step: 19400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:26:24,429-Speed 3358.86 samples/sec Loss 9.7779 LearningRate 0.0850 Epoch: 1 Global Step: 19410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:26:27,528-Speed 3305.40 samples/sec Loss 9.7883 LearningRate 0.0850 Epoch: 1 Global Step: 19420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:26:30,628-Speed 3303.20 samples/sec Loss 9.8580 LearningRate 0.0850 Epoch: 1 Global Step: 19430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:26:33,662-Speed 3376.39 samples/sec Loss 9.8907 LearningRate 0.0850 Epoch: 1 Global Step: 19440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:26:36,718-Speed 3352.53 samples/sec Loss 9.7914 LearningRate 0.0850 Epoch: 1 Global Step: 19450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:26:39,773-Speed 3352.40 samples/sec Loss 9.9305 LearningRate 0.0849 Epoch: 1 Global Step: 19460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:26:42,854-Speed 3324.61 samples/sec Loss 9.7989 LearningRate 0.0849 Epoch: 1 Global Step: 19470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:26:45,851-Speed 3417.65 samples/sec Loss 9.7893 LearningRate 0.0849 Epoch: 1 Global Step: 19480 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:26:48,865-Speed 3399.34 samples/sec Loss 9.8533 LearningRate 0.0849 Epoch: 1 Global Step: 19490 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:26:51,924-Speed 3348.82 samples/sec Loss 9.8819 LearningRate 0.0849 Epoch: 1 Global Step: 19500 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:26:54,966-Speed 3367.14 samples/sec Loss 9.7386 LearningRate 0.0849 Epoch: 1 Global Step: 19510 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:26:57,976-Speed 3403.28 samples/sec Loss 9.8307 LearningRate 0.0849 Epoch: 1 Global Step: 19520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:01,018-Speed 3367.25 samples/sec Loss 9.9171 LearningRate 0.0849 Epoch: 1 Global Step: 19530 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:04,123-Speed 3298.11 samples/sec Loss 9.8031 LearningRate 0.0849 Epoch: 1 Global Step: 19540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:07,192-Speed 3338.38 samples/sec Loss 9.8169 LearningRate 0.0849 Epoch: 1 Global Step: 19550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:10,216-Speed 3387.55 samples/sec Loss 9.7139 LearningRate 0.0849 Epoch: 1 Global Step: 19560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:13,258-Speed 3367.11 samples/sec Loss 9.8778 LearningRate 0.0849 Epoch: 1 Global Step: 19570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:16,326-Speed 3339.02 samples/sec Loss 9.7767 LearningRate 0.0849 Epoch: 1 Global Step: 19580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:19,386-Speed 3347.28 samples/sec Loss 9.8335 LearningRate 0.0849 Epoch: 1 Global Step: 19590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:22,418-Speed 3377.91 samples/sec Loss 9.7364 LearningRate 0.0848 Epoch: 1 Global Step: 19600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:25,497-Speed 3327.47 samples/sec Loss 9.7512 LearningRate 0.0848 Epoch: 1 Global Step: 19610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:28,529-Speed 3377.82 samples/sec Loss 9.8222 LearningRate 0.0848 Epoch: 1 Global Step: 19620 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:31,580-Speed 3357.58 samples/sec Loss 9.8132 LearningRate 0.0848 Epoch: 1 Global Step: 19630 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:34,598-Speed 3393.92 samples/sec Loss 9.8801 LearningRate 0.0848 Epoch: 1 Global Step: 19640 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:37,621-Speed 3388.43 samples/sec Loss 9.8943 LearningRate 0.0848 Epoch: 1 Global Step: 19650 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:40,670-Speed 3358.75 samples/sec Loss 9.8779 LearningRate 0.0848 Epoch: 1 Global Step: 19660 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:43,697-Speed 3384.45 samples/sec Loss 9.8127 LearningRate 0.0848 Epoch: 1 Global Step: 19670 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:27:46,745-Speed 3360.36 samples/sec Loss 9.9897 LearningRate 0.0848 Epoch: 1 Global Step: 19680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:27:49,773-Speed 3383.53 samples/sec Loss 9.7835 LearningRate 0.0848 Epoch: 1 Global Step: 19690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:27:52,807-Speed 3375.93 samples/sec Loss 9.7623 LearningRate 0.0848 Epoch: 1 Global Step: 19700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:27:55,858-Speed 3358.11 samples/sec Loss 9.7805 LearningRate 0.0848 Epoch: 1 Global Step: 19710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:27:58,873-Speed 3396.43 samples/sec Loss 9.9253 LearningRate 0.0848 Epoch: 1 Global Step: 19720 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:01,891-Speed 3394.02 samples/sec Loss 9.9009 LearningRate 0.0847 Epoch: 1 Global Step: 19730 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:04,919-Speed 3382.94 samples/sec Loss 9.7844 LearningRate 0.0847 Epoch: 1 Global Step: 19740 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:07,927-Speed 3405.56 samples/sec Loss 9.8012 LearningRate 0.0847 Epoch: 1 Global Step: 19750 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:10,966-Speed 3370.60 samples/sec Loss 9.8882 LearningRate 0.0847 Epoch: 1 Global Step: 19760 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:13,990-Speed 3386.86 samples/sec Loss 9.7378 LearningRate 0.0847 Epoch: 1 Global Step: 19770 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:17,022-Speed 3379.17 samples/sec Loss 9.7228 LearningRate 0.0847 Epoch: 1 Global Step: 19780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:20,046-Speed 3387.51 samples/sec Loss 9.8186 LearningRate 0.0847 Epoch: 1 Global Step: 19790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:23,121-Speed 3330.89 samples/sec Loss 9.8252 LearningRate 0.0847 Epoch: 1 Global Step: 19800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:26,147-Speed 3385.43 samples/sec Loss 9.7737 LearningRate 0.0847 Epoch: 1 Global Step: 19810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:29,228-Speed 3324.26 samples/sec Loss 9.8384 LearningRate 0.0847 Epoch: 1 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:28:32,253-Speed 3386.13 samples/sec Loss 10.0563 LearningRate 0.0847 Epoch: 1 Global Step: 19830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:35,296-Speed 3366.26 samples/sec Loss 9.7242 LearningRate 0.0847 Epoch: 1 Global Step: 19840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:38,353-Speed 3350.56 samples/sec Loss 9.8071 LearningRate 0.0847 Epoch: 1 Global Step: 19850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:41,392-Speed 3371.14 samples/sec Loss 9.9053 LearningRate 0.0847 Epoch: 1 Global Step: 19860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:44,433-Speed 3367.99 samples/sec Loss 9.8142 LearningRate 0.0846 Epoch: 1 Global Step: 19870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:47,513-Speed 3326.05 samples/sec Loss 9.7451 LearningRate 0.0846 Epoch: 1 Global Step: 19880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:50,582-Speed 3337.54 samples/sec Loss 9.7990 LearningRate 0.0846 Epoch: 1 Global Step: 19890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:53,681-Speed 3305.47 samples/sec Loss 9.4872 LearningRate 0.0846 Epoch: 1 Global Step: 19900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:56,719-Speed 3371.53 samples/sec Loss 9.9015 LearningRate 0.0846 Epoch: 1 Global Step: 19910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:28:59,796-Speed 3328.38 samples/sec Loss 9.8751 LearningRate 0.0846 Epoch: 1 Global Step: 19920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:29:02,814-Speed 3394.33 samples/sec Loss 9.8005 LearningRate 0.0846 Epoch: 1 Global Step: 19930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:29:05,854-Speed 3369.65 samples/sec Loss 9.7338 LearningRate 0.0846 Epoch: 1 Global Step: 19940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:29:08,875-Speed 3390.34 samples/sec Loss 9.7452 LearningRate 0.0846 Epoch: 1 Global Step: 19950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:29:11,919-Speed 3365.08 samples/sec Loss 9.6677 LearningRate 0.0846 Epoch: 1 Global Step: 19960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:29:14,963-Speed 3365.70 samples/sec Loss 9.7825 LearningRate 0.0846 Epoch: 1 Global Step: 19970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:29:17,991-Speed 3382.71 samples/sec Loss 9.6926 LearningRate 0.0846 Epoch: 1 Global Step: 19980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:29:20,997-Speed 3407.67 samples/sec Loss 9.7032 LearningRate 0.0846 Epoch: 1 Global Step: 19990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:29:24,045-Speed 3360.74 samples/sec Loss 9.7517 LearningRate 0.0845 Epoch: 1 Global Step: 20000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:29:27,087-Speed 3367.53 samples/sec Loss 9.8456 LearningRate 0.0845 Epoch: 1 Global Step: 20010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:29:30,152-Speed 3342.22 samples/sec Loss 9.8367 LearningRate 0.0845 Epoch: 1 Global Step: 20020 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:29:33,208-Speed 3351.09 samples/sec Loss 9.7571 LearningRate 0.0845 Epoch: 1 Global Step: 20030 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:29:36,253-Speed 3363.92 samples/sec Loss 9.7880 LearningRate 0.0845 Epoch: 1 Global Step: 20040 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:29:39,267-Speed 3398.62 samples/sec Loss 9.6983 LearningRate 0.0845 Epoch: 1 Global Step: 20050 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:29:42,285-Speed 3394.69 samples/sec Loss 9.7116 LearningRate 0.0845 Epoch: 1 Global Step: 20060 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:29:45,297-Speed 3400.72 samples/sec Loss 9.9285 LearningRate 0.0845 Epoch: 1 Global Step: 20070 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 03:29:48,302-Speed 3408.65 samples/sec Loss 9.7954 LearningRate 0.0845 Epoch: 1 Global Step: 20080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:29:51,319-Speed 3395.44 samples/sec Loss 9.7055 LearningRate 0.0845 Epoch: 1 Global Step: 20090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:29:54,348-Speed 3381.73 samples/sec Loss 9.8115 LearningRate 0.0845 Epoch: 1 Global Step: 20100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 03:29:57,398-Speed 3358.83 samples/sec Loss 9.7083 LearningRate 0.0845 Epoch: 1 Global Step: 20110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:30:00,459-Speed 3345.50 samples/sec Loss 9.8866 LearningRate 0.0845 Epoch: 1 Global Step: 20120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:30:03,515-Speed 3352.14 samples/sec Loss 9.7676 LearningRate 0.0845 Epoch: 1 Global Step: 20130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:30:06,659-Speed 3258.37 samples/sec Loss 9.7614 LearningRate 0.0844 Epoch: 1 Global Step: 20140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:30:09,682-Speed 3387.59 samples/sec Loss 9.6366 LearningRate 0.0844 Epoch: 1 Global Step: 20150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:30:12,713-Speed 3380.13 samples/sec Loss 9.7687 LearningRate 0.0844 Epoch: 1 Global Step: 20160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:30:15,728-Speed 3396.55 samples/sec Loss 9.7319 LearningRate 0.0844 Epoch: 1 Global Step: 20170 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:18,792-Speed 3343.37 samples/sec Loss 9.7148 LearningRate 0.0844 Epoch: 1 Global Step: 20180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:21,821-Speed 3381.95 samples/sec Loss 9.7883 LearningRate 0.0844 Epoch: 1 Global Step: 20190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:24,857-Speed 3373.95 samples/sec Loss 9.6684 LearningRate 0.0844 Epoch: 1 Global Step: 20200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:27,893-Speed 3373.95 samples/sec Loss 9.8348 LearningRate 0.0844 Epoch: 1 Global Step: 20210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:30,918-Speed 3385.48 samples/sec Loss 9.7084 LearningRate 0.0844 Epoch: 1 Global Step: 20220 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:33,930-Speed 3401.32 samples/sec Loss 9.8988 LearningRate 0.0844 Epoch: 1 Global Step: 20230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:37,037-Speed 3296.33 samples/sec Loss 9.8081 LearningRate 0.0844 Epoch: 1 Global Step: 20240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:40,092-Speed 3352.75 samples/sec Loss 9.7662 LearningRate 0.0844 Epoch: 1 Global Step: 20250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:43,153-Speed 3346.21 samples/sec Loss 9.6791 LearningRate 0.0844 Epoch: 1 Global Step: 20260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:46,194-Speed 3369.34 samples/sec Loss 9.7482 LearningRate 0.0843 Epoch: 1 Global Step: 20270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:49,206-Speed 3400.58 samples/sec Loss 9.6429 LearningRate 0.0843 Epoch: 1 Global Step: 20280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:52,246-Speed 3368.89 samples/sec Loss 9.6434 LearningRate 0.0843 Epoch: 1 Global Step: 20290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:55,310-Speed 3343.87 samples/sec Loss 9.7203 LearningRate 0.0843 Epoch: 1 Global Step: 20300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:30:58,316-Speed 3407.57 samples/sec Loss 9.7720 LearningRate 0.0843 Epoch: 1 Global Step: 20310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:31:01,396-Speed 3325.47 samples/sec Loss 9.7293 LearningRate 0.0843 Epoch: 1 Global Step: 20320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:31:04,431-Speed 3375.04 samples/sec Loss 9.7501 LearningRate 0.0843 Epoch: 1 Global Step: 20330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:31:07,472-Speed 3369.32 samples/sec Loss 9.7925 LearningRate 0.0843 Epoch: 1 Global Step: 20340 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:31:10,492-Speed 3391.90 samples/sec Loss 9.7635 LearningRate 0.0843 Epoch: 1 Global Step: 20350 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:31:13,553-Speed 3345.95 samples/sec Loss 9.6419 LearningRate 0.0843 Epoch: 1 Global Step: 20360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:31:16,626-Speed 3334.02 samples/sec Loss 9.7165 LearningRate 0.0843 Epoch: 1 Global Step: 20370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:19,633-Speed 3405.34 samples/sec Loss 9.7335 LearningRate 0.0843 Epoch: 1 Global Step: 20380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:22,673-Speed 3369.86 samples/sec Loss 9.6554 LearningRate 0.0843 Epoch: 1 Global Step: 20390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:25,783-Speed 3293.63 samples/sec Loss 9.7861 LearningRate 0.0843 Epoch: 1 Global Step: 20400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:28,844-Speed 3347.06 samples/sec Loss 9.7667 LearningRate 0.0842 Epoch: 1 Global Step: 20410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:31,870-Speed 3384.01 samples/sec Loss 9.9012 LearningRate 0.0842 Epoch: 1 Global Step: 20420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:34,899-Speed 3382.98 samples/sec Loss 9.6543 LearningRate 0.0842 Epoch: 1 Global Step: 20430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:37,951-Speed 3355.18 samples/sec Loss 9.7171 LearningRate 0.0842 Epoch: 1 Global Step: 20440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:41,792-Speed 2666.45 samples/sec Loss 9.7737 LearningRate 0.0842 Epoch: 1 Global Step: 20450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:44,821-Speed 3382.72 samples/sec Loss 9.8036 LearningRate 0.0842 Epoch: 1 Global Step: 20460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:47,833-Speed 3401.10 samples/sec Loss 9.6985 LearningRate 0.0842 Epoch: 1 Global Step: 20470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 03:31:50,855-Speed 3389.07 samples/sec Loss 9.6096 LearningRate 0.0842 Epoch: 1 Global Step: 20480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:53,911-Speed 3351.77 samples/sec Loss 9.7947 LearningRate 0.0842 Epoch: 1 Global Step: 20490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:56,926-Speed 3396.87 samples/sec Loss 9.7214 LearningRate 0.0842 Epoch: 1 Global Step: 20500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:31:59,984-Speed 3350.22 samples/sec Loss 9.7352 LearningRate 0.0842 Epoch: 1 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:03,040-Speed 3351.18 samples/sec Loss 9.8811 LearningRate 0.0842 Epoch: 1 Global Step: 20520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:06,111-Speed 3336.14 samples/sec Loss 9.6212 LearningRate 0.0842 Epoch: 1 Global Step: 20530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:09,138-Speed 3383.43 samples/sec Loss 9.7484 LearningRate 0.0841 Epoch: 1 Global Step: 20540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:12,174-Speed 3373.79 samples/sec Loss 9.6378 LearningRate 0.0841 Epoch: 1 Global Step: 20550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:15,214-Speed 3369.95 samples/sec Loss 9.6903 LearningRate 0.0841 Epoch: 1 Global Step: 20560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:18,245-Speed 3378.90 samples/sec Loss 9.6341 LearningRate 0.0841 Epoch: 1 Global Step: 20570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:21,294-Speed 3360.22 samples/sec Loss 9.7708 LearningRate 0.0841 Epoch: 1 Global Step: 20580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 03:32:24,323-Speed 3381.82 samples/sec Loss 9.6693 LearningRate 0.0841 Epoch: 1 Global Step: 20590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:27,392-Speed 3336.91 samples/sec Loss 9.7790 LearningRate 0.0841 Epoch: 1 Global Step: 20600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:30,423-Speed 3380.03 samples/sec Loss 9.6942 LearningRate 0.0841 Epoch: 1 Global Step: 20610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:33,452-Speed 3381.91 samples/sec Loss 9.6339 LearningRate 0.0841 Epoch: 1 Global Step: 20620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:36,534-Speed 3323.76 samples/sec Loss 9.7631 LearningRate 0.0841 Epoch: 1 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:39,571-Speed 3372.55 samples/sec Loss 9.7252 LearningRate 0.0841 Epoch: 1 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:42,642-Speed 3335.00 samples/sec Loss 9.5847 LearningRate 0.0841 Epoch: 1 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:45,689-Speed 3362.06 samples/sec Loss 9.6997 LearningRate 0.0841 Epoch: 1 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:48,745-Speed 3352.01 samples/sec Loss 9.5930 LearningRate 0.0841 Epoch: 1 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:51,763-Speed 3393.65 samples/sec Loss 9.8413 LearningRate 0.0840 Epoch: 1 Global Step: 20680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:32:54,840-Speed 3329.54 samples/sec Loss 9.5015 LearningRate 0.0840 Epoch: 1 Global Step: 20690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 03:32:57,872-Speed 3378.91 samples/sec Loss 9.7550 LearningRate 0.0840 Epoch: 1 Global Step: 20700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 03:33:00,919-Speed 3361.35 samples/sec Loss 9.6440 LearningRate 0.0840 Epoch: 1 Global Step: 20710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:33:03,958-Speed 3371.02 samples/sec Loss 9.7190 LearningRate 0.0840 Epoch: 1 Global Step: 20720 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:33:06,996-Speed 3371.86 samples/sec Loss 9.5887 LearningRate 0.0840 Epoch: 1 Global Step: 20730 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:33:10,031-Speed 3374.74 samples/sec Loss 9.8286 LearningRate 0.0840 Epoch: 1 Global Step: 20740 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:33:13,073-Speed 3366.62 samples/sec Loss 9.8405 LearningRate 0.0840 Epoch: 1 Global Step: 20750 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:33:16,166-Speed 3312.26 samples/sec Loss 9.7278 LearningRate 0.0840 Epoch: 1 Global Step: 20760 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:33:21,771-Speed 1827.36 samples/sec Loss 9.6914 LearningRate 0.0840 Epoch: 1 Global Step: 20770 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:33:24,795-Speed 3386.81 samples/sec Loss 9.7600 LearningRate 0.0840 Epoch: 1 Global Step: 20780 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:33:27,820-Speed 3386.17 samples/sec Loss 9.8534 LearningRate 0.0840 Epoch: 1 Global Step: 20790 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:33:30,872-Speed 3356.07 samples/sec Loss 9.5342 LearningRate 0.0840 Epoch: 1 Global Step: 20800 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:33:33,905-Speed 3377.48 samples/sec Loss 9.7635 LearningRate 0.0839 Epoch: 1 Global Step: 20810 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:33:36,975-Speed 3336.60 samples/sec Loss 9.6927 LearningRate 0.0839 Epoch: 1 Global Step: 20820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:33:40,022-Speed 3361.31 samples/sec Loss 9.7463 LearningRate 0.0839 Epoch: 1 Global Step: 20830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:33:43,073-Speed 3357.87 samples/sec Loss 9.5037 LearningRate 0.0839 Epoch: 1 Global Step: 20840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:33:46,108-Speed 3374.92 samples/sec Loss 9.6863 LearningRate 0.0839 Epoch: 1 Global Step: 20850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:33:49,141-Speed 3377.41 samples/sec Loss 9.6127 LearningRate 0.0839 Epoch: 1 Global Step: 20860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:33:52,211-Speed 3335.85 samples/sec Loss 9.6980 LearningRate 0.0839 Epoch: 1 Global Step: 20870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:33:55,321-Speed 3294.35 samples/sec Loss 9.5664 LearningRate 0.0839 Epoch: 1 Global Step: 20880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:33:58,354-Speed 3377.26 samples/sec Loss 9.5572 LearningRate 0.0839 Epoch: 1 Global Step: 20890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:34:01,412-Speed 3349.70 samples/sec Loss 9.5651 LearningRate 0.0839 Epoch: 1 Global Step: 20900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:34:04,489-Speed 3329.05 samples/sec Loss 9.7397 LearningRate 0.0839 Epoch: 1 Global Step: 20910 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:34:07,563-Speed 3331.62 samples/sec Loss 9.7247 LearningRate 0.0839 Epoch: 1 Global Step: 20920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:34:10,642-Speed 3327.01 samples/sec Loss 9.7357 LearningRate 0.0839 Epoch: 1 Global Step: 20930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:34:13,691-Speed 3359.64 samples/sec Loss 9.7500 LearningRate 0.0839 Epoch: 1 Global Step: 20940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:34:16,733-Speed 3368.12 samples/sec Loss 9.6818 LearningRate 0.0838 Epoch: 1 Global Step: 20950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:34:19,797-Speed 3342.91 samples/sec Loss 9.7207 LearningRate 0.0838 Epoch: 1 Global Step: 20960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:34:22,834-Speed 3372.95 samples/sec Loss 9.6895 LearningRate 0.0838 Epoch: 1 Global Step: 20970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:34:25,928-Speed 3311.00 samples/sec Loss 9.7228 LearningRate 0.0838 Epoch: 1 Global Step: 20980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:34:28,982-Speed 3353.36 samples/sec Loss 9.5730 LearningRate 0.0838 Epoch: 1 Global Step: 20990 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:34:32,035-Speed 3355.17 samples/sec Loss 9.6888 LearningRate 0.0838 Epoch: 1 Global Step: 21000 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:34:35,090-Speed 3353.69 samples/sec Loss 9.7709 LearningRate 0.0838 Epoch: 1 Global Step: 21010 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:34:38,163-Speed 3333.40 samples/sec Loss 9.6421 LearningRate 0.0838 Epoch: 1 Global Step: 21020 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:34:41,221-Speed 3349.93 samples/sec Loss 9.5346 LearningRate 0.0838 Epoch: 1 Global Step: 21030 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:34:44,247-Speed 3384.08 samples/sec Loss 9.5561 LearningRate 0.0838 Epoch: 1 Global Step: 21040 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:34:47,289-Speed 3367.92 samples/sec Loss 9.6852 LearningRate 0.0838 Epoch: 1 Global Step: 21050 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:34:50,388-Speed 3305.67 samples/sec Loss 9.6878 LearningRate 0.0838 Epoch: 1 Global Step: 21060 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:34:53,444-Speed 3351.60 samples/sec Loss 9.6879 LearningRate 0.0838 Epoch: 1 Global Step: 21070 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:34:56,496-Speed 3355.99 samples/sec Loss 9.8461 LearningRate 0.0837 Epoch: 1 Global Step: 21080 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:34:59,562-Speed 3341.34 samples/sec Loss 9.6009 LearningRate 0.0837 Epoch: 1 Global Step: 21090 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:35:02,618-Speed 3351.51 samples/sec Loss 9.5545 LearningRate 0.0837 Epoch: 1 Global Step: 21100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:05,684-Speed 3341.03 samples/sec Loss 9.6187 LearningRate 0.0837 Epoch: 1 Global Step: 21110 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:08,735-Speed 3357.92 samples/sec Loss 9.5771 LearningRate 0.0837 Epoch: 1 Global Step: 21120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:11,751-Speed 3396.24 samples/sec Loss 9.6018 LearningRate 0.0837 Epoch: 1 Global Step: 21130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:14,820-Speed 3337.63 samples/sec Loss 9.6188 LearningRate 0.0837 Epoch: 1 Global Step: 21140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:17,881-Speed 3346.20 samples/sec Loss 9.6791 LearningRate 0.0837 Epoch: 1 Global Step: 21150 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:20,898-Speed 3394.93 samples/sec Loss 9.6446 LearningRate 0.0837 Epoch: 1 Global Step: 21160 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:23,942-Speed 3364.79 samples/sec Loss 9.6708 LearningRate 0.0837 Epoch: 1 Global Step: 21170 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:26,951-Speed 3404.62 samples/sec Loss 9.5929 LearningRate 0.0837 Epoch: 1 Global Step: 21180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:29,998-Speed 3362.11 samples/sec Loss 9.6711 LearningRate 0.0837 Epoch: 1 Global Step: 21190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:33,002-Speed 3409.49 samples/sec Loss 9.5532 LearningRate 0.0837 Epoch: 1 Global Step: 21200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:35:36,078-Speed 3331.10 samples/sec Loss 9.6638 LearningRate 0.0837 Epoch: 1 Global Step: 21210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:39,115-Speed 3372.60 samples/sec Loss 9.5368 LearningRate 0.0836 Epoch: 1 Global Step: 21220 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:42,170-Speed 3352.72 samples/sec Loss 9.5374 LearningRate 0.0836 Epoch: 1 Global Step: 21230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:45,181-Speed 3402.58 samples/sec Loss 9.6566 LearningRate 0.0836 Epoch: 1 Global Step: 21240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:48,181-Speed 3413.51 samples/sec Loss 9.5991 LearningRate 0.0836 Epoch: 1 Global Step: 21250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:51,232-Speed 3357.38 samples/sec Loss 9.5736 LearningRate 0.0836 Epoch: 1 Global Step: 21260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:54,287-Speed 3353.52 samples/sec Loss 9.5370 LearningRate 0.0836 Epoch: 1 Global Step: 21270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:35:57,313-Speed 3385.20 samples/sec Loss 9.6887 LearningRate 0.0836 Epoch: 1 Global Step: 21280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:00,331-Speed 3393.99 samples/sec Loss 9.4805 LearningRate 0.0836 Epoch: 1 Global Step: 21290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:03,339-Speed 3404.96 samples/sec Loss 9.4110 LearningRate 0.0836 Epoch: 1 Global Step: 21300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:06,391-Speed 3356.20 samples/sec Loss 9.6910 LearningRate 0.0836 Epoch: 1 Global Step: 21310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:36:09,399-Speed 3405.21 samples/sec Loss 9.6755 LearningRate 0.0836 Epoch: 1 Global Step: 21320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:36:12,424-Speed 3386.13 samples/sec Loss 9.6317 LearningRate 0.0836 Epoch: 1 Global Step: 21330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:36:15,465-Speed 3369.59 samples/sec Loss 9.5017 LearningRate 0.0836 Epoch: 1 Global Step: 21340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:36:18,478-Speed 3399.40 samples/sec Loss 9.6552 LearningRate 0.0835 Epoch: 1 Global Step: 21350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:36:21,493-Speed 3397.71 samples/sec Loss 9.5477 LearningRate 0.0835 Epoch: 1 Global Step: 21360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:24,546-Speed 3355.09 samples/sec Loss 9.5873 LearningRate 0.0835 Epoch: 1 Global Step: 21370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:27,586-Speed 3369.29 samples/sec Loss 9.5180 LearningRate 0.0835 Epoch: 1 Global Step: 21380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:30,584-Speed 3416.69 samples/sec Loss 9.5766 LearningRate 0.0835 Epoch: 1 Global Step: 21390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:33,601-Speed 3395.02 samples/sec Loss 9.5792 LearningRate 0.0835 Epoch: 1 Global Step: 21400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:36,648-Speed 3361.53 samples/sec Loss 9.4246 LearningRate 0.0835 Epoch: 1 Global Step: 21410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:39,671-Speed 3388.68 samples/sec Loss 9.6917 LearningRate 0.0835 Epoch: 1 Global Step: 21420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:42,721-Speed 3359.11 samples/sec Loss 9.6729 LearningRate 0.0835 Epoch: 1 Global Step: 21430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:45,719-Speed 3416.88 samples/sec Loss 9.7871 LearningRate 0.0835 Epoch: 1 Global Step: 21440 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:48,739-Speed 3391.31 samples/sec Loss 9.5441 LearningRate 0.0835 Epoch: 1 Global Step: 21450 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:36:51,847-Speed 3295.92 samples/sec Loss 9.6728 LearningRate 0.0835 Epoch: 1 Global Step: 21460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:36:54,940-Speed 3312.07 samples/sec Loss 9.5539 LearningRate 0.0835 Epoch: 1 Global Step: 21470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:36:57,987-Speed 3362.24 samples/sec Loss 9.5947 LearningRate 0.0835 Epoch: 1 Global Step: 21480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:37:00,991-Speed 3409.81 samples/sec Loss 9.5846 LearningRate 0.0834 Epoch: 1 Global Step: 21490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:37:04,038-Speed 3362.17 samples/sec Loss 9.6718 LearningRate 0.0834 Epoch: 1 Global Step: 21500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:37:07,095-Speed 3350.53 samples/sec Loss 9.6305 LearningRate 0.0834 Epoch: 1 Global Step: 21510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:37:10,106-Speed 3401.51 samples/sec Loss 9.6631 LearningRate 0.0834 Epoch: 1 Global Step: 21520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:37:13,208-Speed 3302.22 samples/sec Loss 9.6698 LearningRate 0.0834 Epoch: 1 Global Step: 21530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:37:16,230-Speed 3390.49 samples/sec Loss 9.5163 LearningRate 0.0834 Epoch: 1 Global Step: 21540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:37:19,272-Speed 3366.64 samples/sec Loss 9.7164 LearningRate 0.0834 Epoch: 1 Global Step: 21550 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:37:22,291-Speed 3392.86 samples/sec Loss 9.6187 LearningRate 0.0834 Epoch: 1 Global Step: 21560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:37:25,300-Speed 3404.87 samples/sec Loss 9.5589 LearningRate 0.0834 Epoch: 1 Global Step: 21570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:37:28,343-Speed 3366.36 samples/sec Loss 9.7111 LearningRate 0.0834 Epoch: 1 Global Step: 21580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:37:31,400-Speed 3350.77 samples/sec Loss 9.6561 LearningRate 0.0834 Epoch: 1 Global Step: 21590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:37:34,452-Speed 3355.85 samples/sec Loss 9.5591 LearningRate 0.0834 Epoch: 1 Global Step: 21600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:37:37,550-Speed 3306.72 samples/sec Loss 9.5895 LearningRate 0.0834 Epoch: 1 Global Step: 21610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:37:40,675-Speed 3277.99 samples/sec Loss 9.5428 LearningRate 0.0834 Epoch: 1 Global Step: 21620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:37:43,722-Speed 3361.80 samples/sec Loss 9.5535 LearningRate 0.0833 Epoch: 1 Global Step: 21630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:37:46,801-Speed 3326.18 samples/sec Loss 9.4439 LearningRate 0.0833 Epoch: 1 Global Step: 21640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:37:49,837-Speed 3374.15 samples/sec Loss 9.6198 LearningRate 0.0833 Epoch: 1 Global Step: 21650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:37:52,894-Speed 3350.99 samples/sec Loss 9.6228 LearningRate 0.0833 Epoch: 1 Global Step: 21660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:37:55,977-Speed 3322.38 samples/sec Loss 9.5478 LearningRate 0.0833 Epoch: 1 Global Step: 21670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:37:59,026-Speed 3359.86 samples/sec Loss 9.4861 LearningRate 0.0833 Epoch: 1 Global Step: 21680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:38:02,087-Speed 3346.56 samples/sec Loss 9.5026 LearningRate 0.0833 Epoch: 1 Global Step: 21690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:38:05,111-Speed 3387.24 samples/sec Loss 9.5179 LearningRate 0.0833 Epoch: 1 Global Step: 21700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:38:08,122-Speed 3401.50 samples/sec Loss 9.6576 LearningRate 0.0833 Epoch: 1 Global Step: 21710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:38:11,119-Speed 3418.06 samples/sec Loss 9.5453 LearningRate 0.0833 Epoch: 1 Global Step: 21720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:38:14,206-Speed 3318.37 samples/sec Loss 9.4793 LearningRate 0.0833 Epoch: 1 Global Step: 21730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:38:17,249-Speed 3365.36 samples/sec Loss 9.5895 LearningRate 0.0833 Epoch: 1 Global Step: 21740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:38:20,227-Speed 3439.68 samples/sec Loss 9.5949 LearningRate 0.0833 Epoch: 1 Global Step: 21750 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:38:23,235-Speed 3405.37 samples/sec Loss 9.5571 LearningRate 0.0832 Epoch: 1 Global Step: 21760 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:38:26,239-Speed 3409.77 samples/sec Loss 9.5050 LearningRate 0.0832 Epoch: 1 Global Step: 21770 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:38:29,240-Speed 3413.27 samples/sec Loss 9.5307 LearningRate 0.0832 Epoch: 1 Global Step: 21780 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:38:32,266-Speed 3384.97 samples/sec Loss 9.4711 LearningRate 0.0832 Epoch: 1 Global Step: 21790 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:38:35,340-Speed 3331.74 samples/sec Loss 9.4617 LearningRate 0.0832 Epoch: 1 Global Step: 21800 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:38:38,421-Speed 3324.65 samples/sec Loss 9.4684 LearningRate 0.0832 Epoch: 1 Global Step: 21810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:38:41,481-Speed 3347.87 samples/sec Loss 9.5667 LearningRate 0.0832 Epoch: 1 Global Step: 21820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:38:44,509-Speed 3382.65 samples/sec Loss 9.4222 LearningRate 0.0832 Epoch: 1 Global Step: 21830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:38:47,567-Speed 3350.22 samples/sec Loss 9.6718 LearningRate 0.0832 Epoch: 1 Global Step: 21840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:38:50,563-Speed 3418.87 samples/sec Loss 9.4784 LearningRate 0.0832 Epoch: 1 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:38:53,572-Speed 3404.38 samples/sec Loss 9.3825 LearningRate 0.0832 Epoch: 1 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:38:56,588-Speed 3396.18 samples/sec Loss 9.5777 LearningRate 0.0832 Epoch: 1 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:38:59,618-Speed 3380.44 samples/sec Loss 9.4324 LearningRate 0.0832 Epoch: 1 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:39:02,675-Speed 3350.95 samples/sec Loss 9.5886 LearningRate 0.0832 Epoch: 1 Global Step: 21890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:39:05,726-Speed 3358.27 samples/sec Loss 9.3664 LearningRate 0.0831 Epoch: 1 Global Step: 21900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:39:08,754-Speed 3381.71 samples/sec Loss 9.5344 LearningRate 0.0831 Epoch: 1 Global Step: 21910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:39:11,830-Speed 3330.23 samples/sec Loss 9.5501 LearningRate 0.0831 Epoch: 1 Global Step: 21920 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:39:14,857-Speed 3384.88 samples/sec Loss 9.5495 LearningRate 0.0831 Epoch: 1 Global Step: 21930 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:39:17,869-Speed 3400.18 samples/sec Loss 9.6850 LearningRate 0.0831 Epoch: 1 Global Step: 21940 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:39:20,917-Speed 3360.96 samples/sec Loss 9.6271 LearningRate 0.0831 Epoch: 1 Global Step: 21950 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:39:23,992-Speed 3331.56 samples/sec Loss 9.3466 LearningRate 0.0831 Epoch: 1 Global Step: 21960 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:39:27,089-Speed 3306.89 samples/sec Loss 9.5797 LearningRate 0.0831 Epoch: 1 Global Step: 21970 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:39:30,187-Speed 3307.49 samples/sec Loss 9.3204 LearningRate 0.0831 Epoch: 1 Global Step: 21980 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:39:33,239-Speed 3355.71 samples/sec Loss 9.5271 LearningRate 0.0831 Epoch: 1 Global Step: 21990 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:39:36,269-Speed 3380.76 samples/sec Loss 9.5265 LearningRate 0.0831 Epoch: 1 Global Step: 22000 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:39:39,341-Speed 3335.19 samples/sec Loss 9.4576 LearningRate 0.0831 Epoch: 1 Global Step: 22010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:39:42,350-Speed 3403.73 samples/sec Loss 9.5201 LearningRate 0.0831 Epoch: 1 Global Step: 22020 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:39:45,380-Speed 3381.11 samples/sec Loss 9.4934 LearningRate 0.0831 Epoch: 1 Global Step: 22030 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:39:48,455-Speed 3331.35 samples/sec Loss 9.4905 LearningRate 0.0830 Epoch: 1 Global Step: 22040 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:39:51,505-Speed 3358.78 samples/sec Loss 9.4880 LearningRate 0.0830 Epoch: 1 Global Step: 22050 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:39:54,551-Speed 3362.21 samples/sec Loss 9.5122 LearningRate 0.0830 Epoch: 1 Global Step: 22060 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:39:57,580-Speed 3382.81 samples/sec Loss 9.5267 LearningRate 0.0830 Epoch: 1 Global Step: 22070 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:40:00,677-Speed 3307.26 samples/sec Loss 9.5646 LearningRate 0.0830 Epoch: 1 Global Step: 22080 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:40:03,722-Speed 3363.46 samples/sec Loss 9.4707 LearningRate 0.0830 Epoch: 1 Global Step: 22090 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:40:06,747-Speed 3386.52 samples/sec Loss 9.5233 LearningRate 0.0830 Epoch: 1 Global Step: 22100 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:40:09,783-Speed 3374.57 samples/sec Loss 9.5019 LearningRate 0.0830 Epoch: 1 Global Step: 22110 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:40:12,795-Speed 3399.94 samples/sec Loss 9.4528 LearningRate 0.0830 Epoch: 1 Global Step: 22120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:15,830-Speed 3375.04 samples/sec Loss 9.4026 LearningRate 0.0830 Epoch: 1 Global Step: 22130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:18,929-Speed 3305.90 samples/sec Loss 9.4669 LearningRate 0.0830 Epoch: 1 Global Step: 22140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:21,958-Speed 3381.46 samples/sec Loss 9.4877 LearningRate 0.0830 Epoch: 1 Global Step: 22150 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:24,981-Speed 3388.06 samples/sec Loss 9.4277 LearningRate 0.0830 Epoch: 1 Global Step: 22160 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:28,020-Speed 3371.25 samples/sec Loss 9.4982 LearningRate 0.0829 Epoch: 1 Global Step: 22170 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:31,044-Speed 3386.69 samples/sec Loss 9.5622 LearningRate 0.0829 Epoch: 1 Global Step: 22180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:34,071-Speed 3384.66 samples/sec Loss 9.5260 LearningRate 0.0829 Epoch: 1 Global Step: 22190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:37,108-Speed 3372.18 samples/sec Loss 9.4665 LearningRate 0.0829 Epoch: 1 Global Step: 22200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:40,110-Speed 3412.91 samples/sec Loss 9.5136 LearningRate 0.0829 Epoch: 1 Global Step: 22210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:43,148-Speed 3371.18 samples/sec Loss 9.5427 LearningRate 0.0829 Epoch: 1 Global Step: 22220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:40:46,198-Speed 3359.12 samples/sec Loss 9.3395 LearningRate 0.0829 Epoch: 1 Global Step: 22230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:40:49,218-Speed 3391.41 samples/sec Loss 9.4933 LearningRate 0.0829 Epoch: 1 Global Step: 22240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:52,290-Speed 3334.76 samples/sec Loss 9.5898 LearningRate 0.0829 Epoch: 1 Global Step: 22250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:55,342-Speed 3356.32 samples/sec Loss 9.4377 LearningRate 0.0829 Epoch: 1 Global Step: 22260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:40:58,341-Speed 3415.12 samples/sec Loss 9.4567 LearningRate 0.0829 Epoch: 1 Global Step: 22270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:01,399-Speed 3349.84 samples/sec Loss 9.6186 LearningRate 0.0829 Epoch: 1 Global Step: 22280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:04,501-Speed 3301.67 samples/sec Loss 9.5255 LearningRate 0.0829 Epoch: 1 Global Step: 22290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:07,585-Speed 3321.42 samples/sec Loss 9.5104 LearningRate 0.0829 Epoch: 1 Global Step: 22300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:10,615-Speed 3381.16 samples/sec Loss 9.4374 LearningRate 0.0828 Epoch: 1 Global Step: 22310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:13,685-Speed 3336.47 samples/sec Loss 9.3645 LearningRate 0.0828 Epoch: 1 Global Step: 22320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:16,763-Speed 3327.82 samples/sec Loss 9.3922 LearningRate 0.0828 Epoch: 1 Global Step: 22330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:19,813-Speed 3358.76 samples/sec Loss 9.4503 LearningRate 0.0828 Epoch: 1 Global Step: 22340 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:22,835-Speed 3389.09 samples/sec Loss 9.5304 LearningRate 0.0828 Epoch: 1 Global Step: 22350 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:25,904-Speed 3337.28 samples/sec Loss 9.5046 LearningRate 0.0828 Epoch: 1 Global Step: 22360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:28,992-Speed 3317.46 samples/sec Loss 9.4408 LearningRate 0.0828 Epoch: 1 Global Step: 22370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:32,021-Speed 3381.51 samples/sec Loss 9.4398 LearningRate 0.0828 Epoch: 1 Global Step: 22380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:35,053-Speed 3378.91 samples/sec Loss 9.2223 LearningRate 0.0828 Epoch: 1 Global Step: 22390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:38,080-Speed 3383.43 samples/sec Loss 9.5064 LearningRate 0.0828 Epoch: 1 Global Step: 22400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:41,130-Speed 3359.03 samples/sec Loss 9.5652 LearningRate 0.0828 Epoch: 1 Global Step: 22410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:44,207-Speed 3329.19 samples/sec Loss 9.5867 LearningRate 0.0828 Epoch: 1 Global Step: 22420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:47,237-Speed 3380.42 samples/sec Loss 9.3555 LearningRate 0.0828 Epoch: 1 Global Step: 22430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:50,352-Speed 3288.19 samples/sec Loss 9.4128 LearningRate 0.0827 Epoch: 1 Global Step: 22440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:41:53,438-Speed 3319.23 samples/sec Loss 9.5013 LearningRate 0.0827 Epoch: 1 Global Step: 22450 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:56,454-Speed 3396.81 samples/sec Loss 9.4627 LearningRate 0.0827 Epoch: 1 Global Step: 22460 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:41:59,482-Speed 3382.72 samples/sec Loss 9.4267 LearningRate 0.0827 Epoch: 1 Global Step: 22470 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:42:02,531-Speed 3359.78 samples/sec Loss 9.4133 LearningRate 0.0827 Epoch: 1 Global Step: 22480 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:42:05,646-Speed 3288.16 samples/sec Loss 9.4655 LearningRate 0.0827 Epoch: 1 Global Step: 22490 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:42:08,696-Speed 3358.34 samples/sec Loss 9.4809 LearningRate 0.0827 Epoch: 1 Global Step: 22500 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:42:11,760-Speed 3343.32 samples/sec Loss 9.4601 LearningRate 0.0827 Epoch: 1 Global Step: 22510 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:42:14,782-Speed 3389.76 samples/sec Loss 9.2440 LearningRate 0.0827 Epoch: 1 Global Step: 22520 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:42:17,835-Speed 3355.40 samples/sec Loss 9.3596 LearningRate 0.0827 Epoch: 1 Global Step: 22530 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:42:20,864-Speed 3381.08 samples/sec Loss 9.4063 LearningRate 0.0827 Epoch: 1 Global Step: 22540 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:42:23,907-Speed 3366.64 samples/sec Loss 9.3001 LearningRate 0.0827 Epoch: 1 Global Step: 22550 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:42:26,971-Speed 3342.43 samples/sec Loss 9.5336 LearningRate 0.0827 Epoch: 1 Global Step: 22560 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:42:29,981-Speed 3403.60 samples/sec Loss 9.4052 LearningRate 0.0827 Epoch: 1 Global Step: 22570 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:42:32,999-Speed 3393.69 samples/sec Loss 9.3842 LearningRate 0.0826 Epoch: 1 Global Step: 22580 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:42:36,034-Speed 3375.67 samples/sec Loss 9.5616 LearningRate 0.0826 Epoch: 1 Global Step: 22590 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:42:39,065-Speed 3378.68 samples/sec Loss 9.6326 LearningRate 0.0826 Epoch: 1 Global Step: 22600 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:42:42,118-Speed 3356.04 samples/sec Loss 9.5983 LearningRate 0.0826 Epoch: 1 Global Step: 22610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:42:45,127-Speed 3404.23 samples/sec Loss 9.4683 LearningRate 0.0826 Epoch: 1 Global Step: 22620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:42:48,152-Speed 3385.96 samples/sec Loss 9.5484 LearningRate 0.0826 Epoch: 1 Global Step: 22630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:42:51,186-Speed 3375.47 samples/sec Loss 9.4726 LearningRate 0.0826 Epoch: 1 Global Step: 22640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:42:54,300-Speed 3290.14 samples/sec Loss 9.4539 LearningRate 0.0826 Epoch: 1 Global Step: 22650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:42:57,310-Speed 3402.81 samples/sec Loss 9.2898 LearningRate 0.0826 Epoch: 1 Global Step: 22660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:43:00,348-Speed 3371.25 samples/sec Loss 9.4269 LearningRate 0.0826 Epoch: 1 Global Step: 22670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:43:03,411-Speed 3343.51 samples/sec Loss 9.3017 LearningRate 0.0826 Epoch: 1 Global Step: 22680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:43:06,460-Speed 3360.33 samples/sec Loss 9.3273 LearningRate 0.0826 Epoch: 1 Global Step: 22690 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:43:09,489-Speed 3381.48 samples/sec Loss 9.2780 LearningRate 0.0826 Epoch: 1 Global Step: 22700 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:43:12,588-Speed 3305.43 samples/sec Loss 9.4615 LearningRate 0.0826 Epoch: 1 Global Step: 22710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:43:15,612-Speed 3387.40 samples/sec Loss 9.5317 LearningRate 0.0825 Epoch: 1 Global Step: 22720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:43:18,704-Speed 3312.08 samples/sec Loss 9.4065 LearningRate 0.0825 Epoch: 1 Global Step: 22730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:43:21,733-Speed 3382.01 samples/sec Loss 9.3853 LearningRate 0.0825 Epoch: 1 Global Step: 22740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:43:24,804-Speed 3335.98 samples/sec Loss 9.2953 LearningRate 0.0825 Epoch: 1 Global Step: 22750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:43:27,861-Speed 3350.75 samples/sec Loss 9.4167 LearningRate 0.0825 Epoch: 1 Global Step: 22760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:43:30,955-Speed 3310.21 samples/sec Loss 9.3739 LearningRate 0.0825 Epoch: 1 Global Step: 22770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:43:33,980-Speed 3385.20 samples/sec Loss 9.4655 LearningRate 0.0825 Epoch: 1 Global Step: 22780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:43:37,055-Speed 3332.10 samples/sec Loss 9.4489 LearningRate 0.0825 Epoch: 1 Global Step: 22790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:43:40,105-Speed 3358.27 samples/sec Loss 9.2815 LearningRate 0.0825 Epoch: 1 Global Step: 22800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:43:43,163-Speed 3349.02 samples/sec Loss 9.4515 LearningRate 0.0825 Epoch: 1 Global Step: 22810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:43:46,197-Speed 3376.26 samples/sec Loss 9.3221 LearningRate 0.0825 Epoch: 1 Global Step: 22820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:43:49,226-Speed 3382.10 samples/sec Loss 9.4159 LearningRate 0.0825 Epoch: 1 Global Step: 22830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:43:52,278-Speed 3356.18 samples/sec Loss 9.3031 LearningRate 0.0825 Epoch: 1 Global Step: 22840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:43:55,328-Speed 3358.94 samples/sec Loss 9.3903 LearningRate 0.0824 Epoch: 1 Global Step: 22850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:43:58,338-Speed 3402.30 samples/sec Loss 9.3535 LearningRate 0.0824 Epoch: 1 Global Step: 22860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:44:01,462-Speed 3279.24 samples/sec Loss 9.4637 LearningRate 0.0824 Epoch: 1 Global Step: 22870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:44:04,499-Speed 3372.58 samples/sec Loss 9.3969 LearningRate 0.0824 Epoch: 1 Global Step: 22880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:44:07,510-Speed 3401.66 samples/sec Loss 9.3536 LearningRate 0.0824 Epoch: 1 Global Step: 22890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:44:10,520-Speed 3403.50 samples/sec Loss 9.3428 LearningRate 0.0824 Epoch: 1 Global Step: 22900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:44:13,541-Speed 3390.92 samples/sec Loss 9.2849 LearningRate 0.0824 Epoch: 1 Global Step: 22910 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:44:16,597-Speed 3351.76 samples/sec Loss 9.2800 LearningRate 0.0824 Epoch: 1 Global Step: 22920 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:44:19,602-Speed 3408.70 samples/sec Loss 9.3245 LearningRate 0.0824 Epoch: 1 Global Step: 22930 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:44:22,640-Speed 3371.59 samples/sec Loss 9.3173 LearningRate 0.0824 Epoch: 1 Global Step: 22940 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:44:25,662-Speed 3389.67 samples/sec Loss 9.5493 LearningRate 0.0824 Epoch: 1 Global Step: 22950 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:44:28,696-Speed 3376.04 samples/sec Loss 9.4005 LearningRate 0.0824 Epoch: 1 Global Step: 22960 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:44:31,779-Speed 3322.09 samples/sec Loss 9.4993 LearningRate 0.0824 Epoch: 1 Global Step: 22970 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:44:34,856-Speed 3329.53 samples/sec Loss 9.3260 LearningRate 0.0824 Epoch: 1 Global Step: 22980 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:44:37,952-Speed 3307.83 samples/sec Loss 9.3846 LearningRate 0.0823 Epoch: 1 Global Step: 22990 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:44:41,065-Speed 3290.73 samples/sec Loss 9.4471 LearningRate 0.0823 Epoch: 1 Global Step: 23000 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:44:44,136-Speed 3335.71 samples/sec Loss 9.4558 LearningRate 0.0823 Epoch: 1 Global Step: 23010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:44:47,134-Speed 3416.80 samples/sec Loss 9.5097 LearningRate 0.0823 Epoch: 1 Global Step: 23020 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:44:50,251-Speed 3286.54 samples/sec Loss 9.3426 LearningRate 0.0823 Epoch: 1 Global Step: 23030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:44:53,313-Speed 3345.03 samples/sec Loss 9.2649 LearningRate 0.0823 Epoch: 1 Global Step: 23040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:44:56,363-Speed 3359.09 samples/sec Loss 9.4111 LearningRate 0.0823 Epoch: 1 Global Step: 23050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:44:59,361-Speed 3415.63 samples/sec Loss 9.3337 LearningRate 0.0823 Epoch: 1 Global Step: 23060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:02,437-Speed 3330.31 samples/sec Loss 9.3323 LearningRate 0.0823 Epoch: 1 Global Step: 23070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:05,485-Speed 3360.84 samples/sec Loss 9.3079 LearningRate 0.0823 Epoch: 1 Global Step: 23080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:08,505-Speed 3391.64 samples/sec Loss 9.3951 LearningRate 0.0823 Epoch: 1 Global Step: 23090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:11,587-Speed 3324.05 samples/sec Loss 9.5179 LearningRate 0.0823 Epoch: 1 Global Step: 23100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:14,616-Speed 3381.88 samples/sec Loss 9.3720 LearningRate 0.0823 Epoch: 1 Global Step: 23110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:45:17,686-Speed 3335.66 samples/sec Loss 9.4074 LearningRate 0.0823 Epoch: 1 Global Step: 23120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:20,749-Speed 3344.40 samples/sec Loss 9.5002 LearningRate 0.0822 Epoch: 1 Global Step: 23130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:23,801-Speed 3356.63 samples/sec Loss 9.3851 LearningRate 0.0822 Epoch: 1 Global Step: 23140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:26,825-Speed 3386.81 samples/sec Loss 9.4494 LearningRate 0.0822 Epoch: 1 Global Step: 23150 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:29,862-Speed 3372.40 samples/sec Loss 9.3132 LearningRate 0.0822 Epoch: 1 Global Step: 23160 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:32,925-Speed 3345.31 samples/sec Loss 9.3407 LearningRate 0.0822 Epoch: 1 Global Step: 23170 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:36,021-Speed 3308.18 samples/sec Loss 9.3980 LearningRate 0.0822 Epoch: 1 Global Step: 23180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:39,079-Speed 3349.91 samples/sec Loss 9.3698 LearningRate 0.0822 Epoch: 1 Global Step: 23190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:42,125-Speed 3362.53 samples/sec Loss 9.3619 LearningRate 0.0822 Epoch: 1 Global Step: 23200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:45,124-Speed 3416.20 samples/sec Loss 9.3534 LearningRate 0.0822 Epoch: 1 Global Step: 23210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:45:48,162-Speed 3372.31 samples/sec Loss 9.4370 LearningRate 0.0822 Epoch: 1 Global Step: 23220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:45:51,163-Speed 3413.32 samples/sec Loss 9.3976 LearningRate 0.0822 Epoch: 1 Global Step: 23230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:45:54,237-Speed 3332.45 samples/sec Loss 9.3236 LearningRate 0.0822 Epoch: 1 Global Step: 23240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:45:57,270-Speed 3376.82 samples/sec Loss 9.4359 LearningRate 0.0822 Epoch: 1 Global Step: 23250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:00,357-Speed 3318.09 samples/sec Loss 9.2441 LearningRate 0.0822 Epoch: 1 Global Step: 23260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:03,430-Speed 3333.11 samples/sec Loss 9.3447 LearningRate 0.0821 Epoch: 1 Global Step: 23270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:06,510-Speed 3326.17 samples/sec Loss 9.2400 LearningRate 0.0821 Epoch: 1 Global Step: 23280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:09,537-Speed 3384.26 samples/sec Loss 9.4018 LearningRate 0.0821 Epoch: 1 Global Step: 23290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:12,669-Speed 3270.41 samples/sec Loss 9.4003 LearningRate 0.0821 Epoch: 1 Global Step: 23300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:15,743-Speed 3331.91 samples/sec Loss 9.3011 LearningRate 0.0821 Epoch: 1 Global Step: 23310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:18,874-Speed 3271.74 samples/sec Loss 9.2512 LearningRate 0.0821 Epoch: 1 Global Step: 23320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:21,899-Speed 3386.79 samples/sec Loss 9.3802 LearningRate 0.0821 Epoch: 1 Global Step: 23330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:24,945-Speed 3362.63 samples/sec Loss 9.2771 LearningRate 0.0821 Epoch: 1 Global Step: 23340 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:27,975-Speed 3380.39 samples/sec Loss 9.3363 LearningRate 0.0821 Epoch: 1 Global Step: 23350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:46:31,088-Speed 3290.49 samples/sec Loss 9.2565 LearningRate 0.0821 Epoch: 1 Global Step: 23360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:46:34,073-Speed 3431.20 samples/sec Loss 9.4509 LearningRate 0.0821 Epoch: 1 Global Step: 23370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:37,141-Speed 3339.64 samples/sec Loss 9.3148 LearningRate 0.0821 Epoch: 1 Global Step: 23380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:40,216-Speed 3330.90 samples/sec Loss 9.3114 LearningRate 0.0821 Epoch: 1 Global Step: 23390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:46:43,268-Speed 3356.19 samples/sec Loss 9.3835 LearningRate 0.0820 Epoch: 1 Global Step: 23400 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:46:46,286-Speed 3395.14 samples/sec Loss 9.3675 LearningRate 0.0820 Epoch: 1 Global Step: 23410 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:46:49,330-Speed 3364.96 samples/sec Loss 9.2313 LearningRate 0.0820 Epoch: 1 Global Step: 23420 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:46:52,365-Speed 3374.72 samples/sec Loss 9.3415 LearningRate 0.0820 Epoch: 1 Global Step: 23430 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:46:55,375-Speed 3403.14 samples/sec Loss 9.2726 LearningRate 0.0820 Epoch: 1 Global Step: 23440 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:46:58,440-Speed 3342.41 samples/sec Loss 9.3209 LearningRate 0.0820 Epoch: 1 Global Step: 23450 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:01,484-Speed 3364.28 samples/sec Loss 9.2779 LearningRate 0.0820 Epoch: 1 Global Step: 23460 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:04,516-Speed 3378.52 samples/sec Loss 9.3445 LearningRate 0.0820 Epoch: 1 Global Step: 23470 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:07,596-Speed 3325.49 samples/sec Loss 9.2502 LearningRate 0.0820 Epoch: 1 Global Step: 23480 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:10,625-Speed 3382.75 samples/sec Loss 9.4315 LearningRate 0.0820 Epoch: 1 Global Step: 23490 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:13,673-Speed 3360.48 samples/sec Loss 9.2816 LearningRate 0.0820 Epoch: 1 Global Step: 23500 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:47:16,710-Speed 3372.81 samples/sec Loss 9.3061 LearningRate 0.0820 Epoch: 1 Global Step: 23510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:47:19,747-Speed 3372.13 samples/sec Loss 9.3033 LearningRate 0.0820 Epoch: 1 Global Step: 23520 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:22,772-Speed 3387.29 samples/sec Loss 9.3701 LearningRate 0.0820 Epoch: 1 Global Step: 23530 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:25,812-Speed 3368.57 samples/sec Loss 9.2995 LearningRate 0.0819 Epoch: 1 Global Step: 23540 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:28,885-Speed 3333.91 samples/sec Loss 9.4093 LearningRate 0.0819 Epoch: 1 Global Step: 23550 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:31,903-Speed 3394.14 samples/sec Loss 9.2348 LearningRate 0.0819 Epoch: 1 Global Step: 23560 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:35,002-Speed 3304.72 samples/sec Loss 9.4142 LearningRate 0.0819 Epoch: 1 Global Step: 23570 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:38,031-Speed 3381.89 samples/sec Loss 9.2773 LearningRate 0.0819 Epoch: 1 Global Step: 23580 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:41,053-Speed 3389.91 samples/sec Loss 9.2936 LearningRate 0.0819 Epoch: 1 Global Step: 23590 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:44,068-Speed 3398.20 samples/sec Loss 9.3121 LearningRate 0.0819 Epoch: 1 Global Step: 23600 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:47,110-Speed 3367.05 samples/sec Loss 9.4444 LearningRate 0.0819 Epoch: 1 Global Step: 23610 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:47:50,152-Speed 3366.97 samples/sec Loss 9.3958 LearningRate 0.0819 Epoch: 1 Global Step: 23620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:47:53,197-Speed 3364.54 samples/sec Loss 9.2652 LearningRate 0.0819 Epoch: 1 Global Step: 23630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:47:56,237-Speed 3369.53 samples/sec Loss 9.2514 LearningRate 0.0819 Epoch: 1 Global Step: 23640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:47:59,322-Speed 3320.15 samples/sec Loss 9.4228 LearningRate 0.0819 Epoch: 1 Global Step: 23650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:02,473-Speed 3250.89 samples/sec Loss 9.3877 LearningRate 0.0819 Epoch: 1 Global Step: 23660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:05,517-Speed 3364.60 samples/sec Loss 9.2007 LearningRate 0.0819 Epoch: 1 Global Step: 23670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:08,524-Speed 3406.90 samples/sec Loss 9.1816 LearningRate 0.0818 Epoch: 1 Global Step: 23680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:11,554-Speed 3379.77 samples/sec Loss 9.2044 LearningRate 0.0818 Epoch: 1 Global Step: 23690 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:14,624-Speed 3336.97 samples/sec Loss 9.2743 LearningRate 0.0818 Epoch: 1 Global Step: 23700 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:17,640-Speed 3395.98 samples/sec Loss 9.4254 LearningRate 0.0818 Epoch: 1 Global Step: 23710 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:20,650-Speed 3403.42 samples/sec Loss 9.3323 LearningRate 0.0818 Epoch: 1 Global Step: 23720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:48:23,664-Speed 3398.81 samples/sec Loss 9.2853 LearningRate 0.0818 Epoch: 1 Global Step: 23730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:48:26,666-Speed 3411.68 samples/sec Loss 9.2991 LearningRate 0.0818 Epoch: 1 Global Step: 23740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:48:29,717-Speed 3357.68 samples/sec Loss 9.3696 LearningRate 0.0818 Epoch: 1 Global Step: 23750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:48:32,743-Speed 3385.61 samples/sec Loss 9.4096 LearningRate 0.0818 Epoch: 1 Global Step: 23760 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:35,821-Speed 3327.32 samples/sec Loss 9.3220 LearningRate 0.0818 Epoch: 1 Global Step: 23770 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:38,827-Speed 3407.95 samples/sec Loss 9.2786 LearningRate 0.0818 Epoch: 1 Global Step: 23780 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:41,853-Speed 3385.05 samples/sec Loss 9.3387 LearningRate 0.0818 Epoch: 1 Global Step: 23790 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:44,866-Speed 3399.87 samples/sec Loss 9.3037 LearningRate 0.0818 Epoch: 1 Global Step: 23800 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:47,872-Speed 3407.62 samples/sec Loss 9.1852 LearningRate 0.0817 Epoch: 1 Global Step: 23810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:50,921-Speed 3359.11 samples/sec Loss 9.3383 LearningRate 0.0817 Epoch: 1 Global Step: 23820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:53,991-Speed 3336.26 samples/sec Loss 9.1772 LearningRate 0.0817 Epoch: 1 Global Step: 23830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:48:57,016-Speed 3386.94 samples/sec Loss 9.2262 LearningRate 0.0817 Epoch: 1 Global Step: 23840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:00,091-Speed 3330.69 samples/sec Loss 9.3300 LearningRate 0.0817 Epoch: 1 Global Step: 23850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:03,132-Speed 3368.17 samples/sec Loss 9.2932 LearningRate 0.0817 Epoch: 1 Global Step: 23860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:06,177-Speed 3364.69 samples/sec Loss 9.3395 LearningRate 0.0817 Epoch: 1 Global Step: 23870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:09,196-Speed 3392.26 samples/sec Loss 9.3122 LearningRate 0.0817 Epoch: 1 Global Step: 23880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:12,275-Speed 3327.19 samples/sec Loss 9.1555 LearningRate 0.0817 Epoch: 1 Global Step: 23890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:15,349-Speed 3332.00 samples/sec Loss 9.2794 LearningRate 0.0817 Epoch: 1 Global Step: 23900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:18,469-Speed 3283.30 samples/sec Loss 9.3048 LearningRate 0.0817 Epoch: 1 Global Step: 23910 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:21,492-Speed 3388.56 samples/sec Loss 9.3004 LearningRate 0.0817 Epoch: 1 Global Step: 23920 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:24,598-Speed 3297.99 samples/sec Loss 9.3107 LearningRate 0.0817 Epoch: 1 Global Step: 23930 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:27,647-Speed 3359.94 samples/sec Loss 9.1487 LearningRate 0.0817 Epoch: 1 Global Step: 23940 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:30,734-Speed 3317.04 samples/sec Loss 9.1623 LearningRate 0.0816 Epoch: 1 Global Step: 23950 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:33,768-Speed 3376.43 samples/sec Loss 9.1281 LearningRate 0.0816 Epoch: 1 Global Step: 23960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:49:36,885-Speed 3286.91 samples/sec Loss 9.3220 LearningRate 0.0816 Epoch: 1 Global Step: 23970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:49:39,928-Speed 3365.30 samples/sec Loss 9.3120 LearningRate 0.0816 Epoch: 1 Global Step: 23980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:49:42,980-Speed 3357.22 samples/sec Loss 9.2172 LearningRate 0.0816 Epoch: 1 Global Step: 23990 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:45,991-Speed 3401.33 samples/sec Loss 9.1878 LearningRate 0.0816 Epoch: 1 Global Step: 24000 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:49,056-Speed 3342.92 samples/sec Loss 9.3521 LearningRate 0.0816 Epoch: 1 Global Step: 24010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:52,091-Speed 3375.01 samples/sec Loss 9.2557 LearningRate 0.0816 Epoch: 1 Global Step: 24020 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:55,104-Speed 3400.19 samples/sec Loss 9.1154 LearningRate 0.0816 Epoch: 1 Global Step: 24030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:49:58,113-Speed 3403.56 samples/sec Loss 9.3479 LearningRate 0.0816 Epoch: 1 Global Step: 24040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:50:01,152-Speed 3371.17 samples/sec Loss 9.2439 LearningRate 0.0816 Epoch: 1 Global Step: 24050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:50:04,201-Speed 3359.13 samples/sec Loss 9.2937 LearningRate 0.0816 Epoch: 1 Global Step: 24060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:50:07,277-Speed 3329.71 samples/sec Loss 9.3226 LearningRate 0.0816 Epoch: 1 Global Step: 24070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:50:10,263-Speed 3430.73 samples/sec Loss 9.1767 LearningRate 0.0816 Epoch: 1 Global Step: 24080 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:50:13,273-Speed 3403.06 samples/sec Loss 9.2144 LearningRate 0.0815 Epoch: 1 Global Step: 24090 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:50:16,395-Speed 3281.76 samples/sec Loss 9.2398 LearningRate 0.0815 Epoch: 1 Global Step: 24100 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:50:19,393-Speed 3415.42 samples/sec Loss 9.2502 LearningRate 0.0815 Epoch: 1 Global Step: 24110 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:50:22,498-Speed 3300.00 samples/sec Loss 9.1600 LearningRate 0.0815 Epoch: 1 Global Step: 24120 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:50:25,559-Speed 3346.08 samples/sec Loss 9.1278 LearningRate 0.0815 Epoch: 1 Global Step: 24130 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:50:28,729-Speed 3231.74 samples/sec Loss 9.2941 LearningRate 0.0815 Epoch: 1 Global Step: 24140 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:50:31,779-Speed 3357.93 samples/sec Loss 9.3047 LearningRate 0.0815 Epoch: 1 Global Step: 24150 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:50:34,835-Speed 3352.54 samples/sec Loss 9.2932 LearningRate 0.0815 Epoch: 1 Global Step: 24160 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:50:37,851-Speed 3396.77 samples/sec Loss 9.2687 LearningRate 0.0815 Epoch: 1 Global Step: 24170 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:50:40,917-Speed 3339.80 samples/sec Loss 9.2944 LearningRate 0.0815 Epoch: 1 Global Step: 24180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:50:43,965-Speed 3360.99 samples/sec Loss 9.2088 LearningRate 0.0815 Epoch: 1 Global Step: 24190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:50:47,001-Speed 3374.65 samples/sec Loss 9.2734 LearningRate 0.0815 Epoch: 1 Global Step: 24200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:50:50,115-Speed 3289.40 samples/sec Loss 9.2970 LearningRate 0.0815 Epoch: 1 Global Step: 24210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:50:53,180-Speed 3341.74 samples/sec Loss 9.1916 LearningRate 0.0815 Epoch: 1 Global Step: 24220 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:50:56,235-Speed 3352.60 samples/sec Loss 9.2861 LearningRate 0.0814 Epoch: 1 Global Step: 24230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:50:59,283-Speed 3361.28 samples/sec Loss 9.2564 LearningRate 0.0814 Epoch: 1 Global Step: 24240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:02,328-Speed 3363.52 samples/sec Loss 9.1633 LearningRate 0.0814 Epoch: 1 Global Step: 24250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:05,415-Speed 3318.54 samples/sec Loss 9.2768 LearningRate 0.0814 Epoch: 1 Global Step: 24260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:08,453-Speed 3371.46 samples/sec Loss 9.1350 LearningRate 0.0814 Epoch: 1 Global Step: 24270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:11,521-Speed 3338.85 samples/sec Loss 9.1320 LearningRate 0.0814 Epoch: 1 Global Step: 24280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:51:14,584-Speed 3343.97 samples/sec Loss 9.2692 LearningRate 0.0814 Epoch: 1 Global Step: 24290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:51:17,590-Speed 3408.48 samples/sec Loss 9.1658 LearningRate 0.0814 Epoch: 1 Global Step: 24300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:51:20,577-Speed 3428.39 samples/sec Loss 9.2988 LearningRate 0.0814 Epoch: 1 Global Step: 24310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:23,595-Speed 3394.13 samples/sec Loss 9.2384 LearningRate 0.0814 Epoch: 1 Global Step: 24320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:26,610-Speed 3399.34 samples/sec Loss 9.1405 LearningRate 0.0814 Epoch: 1 Global Step: 24330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:29,710-Speed 3303.54 samples/sec Loss 9.0709 LearningRate 0.0814 Epoch: 1 Global Step: 24340 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:32,712-Speed 3412.30 samples/sec Loss 9.2842 LearningRate 0.0814 Epoch: 1 Global Step: 24350 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:35,768-Speed 3351.99 samples/sec Loss 9.1909 LearningRate 0.0813 Epoch: 1 Global Step: 24360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:38,809-Speed 3368.85 samples/sec Loss 9.2665 LearningRate 0.0813 Epoch: 1 Global Step: 24370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:41,804-Speed 3419.60 samples/sec Loss 9.2371 LearningRate 0.0813 Epoch: 1 Global Step: 24380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:44,825-Speed 3390.69 samples/sec Loss 9.1785 LearningRate 0.0813 Epoch: 1 Global Step: 24390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:47,827-Speed 3412.21 samples/sec Loss 9.2497 LearningRate 0.0813 Epoch: 1 Global Step: 24400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:51:50,920-Speed 3312.23 samples/sec Loss 9.2693 LearningRate 0.0813 Epoch: 1 Global Step: 24410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:51:54,027-Speed 3297.10 samples/sec Loss 9.0870 LearningRate 0.0813 Epoch: 1 Global Step: 24420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:51:57,035-Speed 3405.51 samples/sec Loss 9.2276 LearningRate 0.0813 Epoch: 1 Global Step: 24430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:00,071-Speed 3373.19 samples/sec Loss 9.2385 LearningRate 0.0813 Epoch: 1 Global Step: 24440 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:03,172-Speed 3304.07 samples/sec Loss 9.1759 LearningRate 0.0813 Epoch: 1 Global Step: 24450 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:06,228-Speed 3351.57 samples/sec Loss 9.1479 LearningRate 0.0813 Epoch: 1 Global Step: 24460 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:09,253-Speed 3386.26 samples/sec Loss 9.2314 LearningRate 0.0813 Epoch: 1 Global Step: 24470 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:12,275-Speed 3388.75 samples/sec Loss 9.0081 LearningRate 0.0813 Epoch: 1 Global Step: 24480 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:15,321-Speed 3363.83 samples/sec Loss 9.2867 LearningRate 0.0813 Epoch: 1 Global Step: 24490 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:18,322-Speed 3413.14 samples/sec Loss 9.1551 LearningRate 0.0812 Epoch: 1 Global Step: 24500 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:21,349-Speed 3384.50 samples/sec Loss 9.2464 LearningRate 0.0812 Epoch: 1 Global Step: 24510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:24,392-Speed 3365.58 samples/sec Loss 9.1924 LearningRate 0.0812 Epoch: 1 Global Step: 24520 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:27,405-Speed 3399.70 samples/sec Loss 9.0587 LearningRate 0.0812 Epoch: 1 Global Step: 24530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:52:30,446-Speed 3368.48 samples/sec Loss 9.2192 LearningRate 0.0812 Epoch: 1 Global Step: 24540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:52:33,484-Speed 3372.05 samples/sec Loss 9.2269 LearningRate 0.0812 Epoch: 1 Global Step: 24550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:52:36,585-Speed 3302.93 samples/sec Loss 9.2276 LearningRate 0.0812 Epoch: 1 Global Step: 24560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:52:39,689-Speed 3299.95 samples/sec Loss 9.3401 LearningRate 0.0812 Epoch: 1 Global Step: 24570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:52:42,755-Speed 3341.12 samples/sec Loss 9.2100 LearningRate 0.0812 Epoch: 1 Global Step: 24580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:52:45,764-Speed 3404.79 samples/sec Loss 9.1321 LearningRate 0.0812 Epoch: 1 Global Step: 24590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:48,827-Speed 3344.25 samples/sec Loss 9.0688 LearningRate 0.0812 Epoch: 1 Global Step: 24600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:51,847-Speed 3391.13 samples/sec Loss 9.1626 LearningRate 0.0812 Epoch: 1 Global Step: 24610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:54,882-Speed 3374.74 samples/sec Loss 9.2894 LearningRate 0.0812 Epoch: 1 Global Step: 24620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:52:57,885-Speed 3410.94 samples/sec Loss 9.1703 LearningRate 0.0812 Epoch: 1 Global Step: 24630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:53:00,931-Speed 3363.44 samples/sec Loss 9.1349 LearningRate 0.0811 Epoch: 1 Global Step: 24640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:53:04,025-Speed 3310.05 samples/sec Loss 9.1326 LearningRate 0.0811 Epoch: 1 Global Step: 24650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:53:07,159-Speed 3268.90 samples/sec Loss 9.2363 LearningRate 0.0811 Epoch: 1 Global Step: 24660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:53:10,167-Speed 3404.91 samples/sec Loss 9.0441 LearningRate 0.0811 Epoch: 1 Global Step: 24670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:53:13,219-Speed 3357.21 samples/sec Loss 9.2049 LearningRate 0.0811 Epoch: 1 Global Step: 24680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:53:16,251-Speed 3377.73 samples/sec Loss 9.2717 LearningRate 0.0811 Epoch: 1 Global Step: 24690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:53:19,330-Speed 3326.95 samples/sec Loss 9.1284 LearningRate 0.0811 Epoch: 1 Global Step: 24700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:53:22,322-Speed 3423.46 samples/sec Loss 9.0873 LearningRate 0.0811 Epoch: 1 Global Step: 24710 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:53:25,352-Speed 3380.68 samples/sec Loss 9.0851 LearningRate 0.0811 Epoch: 1 Global Step: 24720 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:53:28,490-Speed 3264.03 samples/sec Loss 9.2270 LearningRate 0.0811 Epoch: 1 Global Step: 24730 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:53:31,588-Speed 3305.84 samples/sec Loss 9.1926 LearningRate 0.0811 Epoch: 1 Global Step: 24740 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:53:34,639-Speed 3357.57 samples/sec Loss 9.2136 LearningRate 0.0811 Epoch: 1 Global Step: 24750 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:53:37,697-Speed 3349.59 samples/sec Loss 9.1516 LearningRate 0.0811 Epoch: 1 Global Step: 24760 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:53:40,742-Speed 3364.60 samples/sec Loss 9.1569 LearningRate 0.0811 Epoch: 1 Global Step: 24770 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:53:43,790-Speed 3360.01 samples/sec Loss 9.0104 LearningRate 0.0810 Epoch: 1 Global Step: 24780 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:53:46,810-Speed 3392.71 samples/sec Loss 9.0199 LearningRate 0.0810 Epoch: 1 Global Step: 24790 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:53:49,881-Speed 3335.52 samples/sec Loss 9.1359 LearningRate 0.0810 Epoch: 1 Global Step: 24800 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:53:52,983-Speed 3301.57 samples/sec Loss 9.1670 LearningRate 0.0810 Epoch: 1 Global Step: 24810 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:53:56,084-Speed 3303.13 samples/sec Loss 9.2920 LearningRate 0.0810 Epoch: 1 Global Step: 24820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:53:59,120-Speed 3373.56 samples/sec Loss 9.0979 LearningRate 0.0810 Epoch: 1 Global Step: 24830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:54:02,386-Speed 3135.95 samples/sec Loss 9.0534 LearningRate 0.0810 Epoch: 1 Global Step: 24840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:54:33,692-Speed 327.12 samples/sec Loss 7.8858 LearningRate 0.0810 Epoch: 2 Global Step: 24850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:54:36,876-Speed 3216.84 samples/sec Loss 7.5455 LearningRate 0.0810 Epoch: 2 Global Step: 24860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:54:39,938-Speed 3345.50 samples/sec Loss 7.4219 LearningRate 0.0810 Epoch: 2 Global Step: 24870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:54:42,992-Speed 3353.90 samples/sec Loss 7.3821 LearningRate 0.0810 Epoch: 2 Global Step: 24880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:54:45,984-Speed 3424.04 samples/sec Loss 7.4470 LearningRate 0.0810 Epoch: 2 Global Step: 24890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:54:49,028-Speed 3365.43 samples/sec Loss 7.3334 LearningRate 0.0810 Epoch: 2 Global Step: 24900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:54:52,136-Speed 3295.64 samples/sec Loss 7.4329 LearningRate 0.0810 Epoch: 2 Global Step: 24910 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:54:55,178-Speed 3367.43 samples/sec Loss 7.3375 LearningRate 0.0809 Epoch: 2 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:54:58,169-Speed 3424.82 samples/sec Loss 7.2450 LearningRate 0.0809 Epoch: 2 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:55:01,199-Speed 3380.74 samples/sec Loss 7.3903 LearningRate 0.0809 Epoch: 2 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:55:04,235-Speed 3373.82 samples/sec Loss 7.4915 LearningRate 0.0809 Epoch: 2 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:55:07,309-Speed 3331.73 samples/sec Loss 7.4613 LearningRate 0.0809 Epoch: 2 Global Step: 24960 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:10,363-Speed 3355.12 samples/sec Loss 7.2986 LearningRate 0.0809 Epoch: 2 Global Step: 24970 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:13,411-Speed 3360.20 samples/sec Loss 7.3834 LearningRate 0.0809 Epoch: 2 Global Step: 24980 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:16,564-Speed 3249.63 samples/sec Loss 7.4475 LearningRate 0.0809 Epoch: 2 Global Step: 24990 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:19,579-Speed 3397.08 samples/sec Loss 7.4533 LearningRate 0.0809 Epoch: 2 Global Step: 25000 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:22,640-Speed 3346.53 samples/sec Loss 7.3661 LearningRate 0.0809 Epoch: 2 Global Step: 25010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:25,694-Speed 3354.42 samples/sec Loss 7.4789 LearningRate 0.0809 Epoch: 2 Global Step: 25020 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:28,767-Speed 3332.65 samples/sec Loss 7.4635 LearningRate 0.0809 Epoch: 2 Global Step: 25030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:31,805-Speed 3372.19 samples/sec Loss 7.4563 LearningRate 0.0809 Epoch: 2 Global Step: 25040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:34,837-Speed 3377.22 samples/sec Loss 7.3875 LearningRate 0.0808 Epoch: 2 Global Step: 25050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:37,901-Speed 3343.98 samples/sec Loss 7.4821 LearningRate 0.0808 Epoch: 2 Global Step: 25060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:55:40,955-Speed 3353.91 samples/sec Loss 7.6099 LearningRate 0.0808 Epoch: 2 Global Step: 25070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:55:44,023-Speed 3338.38 samples/sec Loss 7.4913 LearningRate 0.0808 Epoch: 2 Global Step: 25080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:55:47,111-Speed 3318.02 samples/sec Loss 7.5069 LearningRate 0.0808 Epoch: 2 Global Step: 25090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:50,203-Speed 3312.05 samples/sec Loss 7.4948 LearningRate 0.0808 Epoch: 2 Global Step: 25100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:53,748-Speed 2889.55 samples/sec Loss 7.5983 LearningRate 0.0808 Epoch: 2 Global Step: 25110 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:56,788-Speed 3370.04 samples/sec Loss 7.5333 LearningRate 0.0808 Epoch: 2 Global Step: 25120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:55:59,805-Speed 3394.22 samples/sec Loss 7.4896 LearningRate 0.0808 Epoch: 2 Global Step: 25130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:56:02,890-Speed 3321.01 samples/sec Loss 7.5659 LearningRate 0.0808 Epoch: 2 Global Step: 25140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:56:05,951-Speed 3346.29 samples/sec Loss 7.5062 LearningRate 0.0808 Epoch: 2 Global Step: 25150 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:56:08,991-Speed 3368.67 samples/sec Loss 7.4968 LearningRate 0.0808 Epoch: 2 Global Step: 25160 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:56:12,052-Speed 3346.48 samples/sec Loss 7.4960 LearningRate 0.0808 Epoch: 2 Global Step: 25170 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:56:15,169-Speed 3286.43 samples/sec Loss 7.4201 LearningRate 0.0808 Epoch: 2 Global Step: 25180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:56:18,240-Speed 3335.91 samples/sec Loss 7.5363 LearningRate 0.0807 Epoch: 2 Global Step: 25190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:56:21,308-Speed 3338.11 samples/sec Loss 7.6023 LearningRate 0.0807 Epoch: 2 Global Step: 25200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:56:24,347-Speed 3371.12 samples/sec Loss 7.5693 LearningRate 0.0807 Epoch: 2 Global Step: 25210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:56:27,377-Speed 3380.72 samples/sec Loss 7.6538 LearningRate 0.0807 Epoch: 2 Global Step: 25220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:56:30,437-Speed 3347.40 samples/sec Loss 7.4741 LearningRate 0.0807 Epoch: 2 Global Step: 25230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:56:33,468-Speed 3379.66 samples/sec Loss 7.6306 LearningRate 0.0807 Epoch: 2 Global Step: 25240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:56:36,544-Speed 3330.74 samples/sec Loss 7.6313 LearningRate 0.0807 Epoch: 2 Global Step: 25250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:56:39,591-Speed 3361.54 samples/sec Loss 7.5582 LearningRate 0.0807 Epoch: 2 Global Step: 25260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:56:42,669-Speed 3327.55 samples/sec Loss 7.5693 LearningRate 0.0807 Epoch: 2 Global Step: 25270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:56:45,693-Speed 3388.02 samples/sec Loss 7.6949 LearningRate 0.0807 Epoch: 2 Global Step: 25280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:56:48,761-Speed 3338.69 samples/sec Loss 7.6293 LearningRate 0.0807 Epoch: 2 Global Step: 25290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:56:51,822-Speed 3346.76 samples/sec Loss 7.5464 LearningRate 0.0807 Epoch: 2 Global Step: 25300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:56:54,827-Speed 3408.36 samples/sec Loss 7.6084 LearningRate 0.0807 Epoch: 2 Global Step: 25310 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:56:57,835-Speed 3404.98 samples/sec Loss 7.6786 LearningRate 0.0807 Epoch: 2 Global Step: 25320 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:57:00,871-Speed 3373.80 samples/sec Loss 7.6619 LearningRate 0.0806 Epoch: 2 Global Step: 25330 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:57:03,964-Speed 3311.83 samples/sec Loss 7.6721 LearningRate 0.0806 Epoch: 2 Global Step: 25340 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:57:06,996-Speed 3378.54 samples/sec Loss 7.5843 LearningRate 0.0806 Epoch: 2 Global Step: 25350 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:57:09,985-Speed 3427.95 samples/sec Loss 7.7252 LearningRate 0.0806 Epoch: 2 Global Step: 25360 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:57:13,057-Speed 3334.62 samples/sec Loss 7.6902 LearningRate 0.0806 Epoch: 2 Global Step: 25370 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:57:16,117-Speed 3347.13 samples/sec Loss 7.6424 LearningRate 0.0806 Epoch: 2 Global Step: 25380 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:57:19,131-Speed 3398.17 samples/sec Loss 7.6745 LearningRate 0.0806 Epoch: 2 Global Step: 25390 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:57:22,160-Speed 3381.50 samples/sec Loss 7.6560 LearningRate 0.0806 Epoch: 2 Global Step: 25400 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:57:25,171-Speed 3401.70 samples/sec Loss 7.6236 LearningRate 0.0806 Epoch: 2 Global Step: 25410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:57:28,224-Speed 3355.30 samples/sec Loss 7.6308 LearningRate 0.0806 Epoch: 2 Global Step: 25420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:57:31,282-Speed 3350.33 samples/sec Loss 7.6953 LearningRate 0.0806 Epoch: 2 Global Step: 25430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:57:34,326-Speed 3364.47 samples/sec Loss 7.7428 LearningRate 0.0806 Epoch: 2 Global Step: 25440 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:57:37,345-Speed 3393.58 samples/sec Loss 7.6879 LearningRate 0.0806 Epoch: 2 Global Step: 25450 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:57:40,386-Speed 3368.56 samples/sec Loss 7.6968 LearningRate 0.0806 Epoch: 2 Global Step: 25460 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:57:43,460-Speed 3332.09 samples/sec Loss 7.6530 LearningRate 0.0805 Epoch: 2 Global Step: 25470 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:57:46,510-Speed 3357.69 samples/sec Loss 7.8024 LearningRate 0.0805 Epoch: 2 Global Step: 25480 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:57:49,524-Speed 3399.05 samples/sec Loss 7.7699 LearningRate 0.0805 Epoch: 2 Global Step: 25490 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:57:52,598-Speed 3331.54 samples/sec Loss 7.6578 LearningRate 0.0805 Epoch: 2 Global Step: 25500 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:57:55,652-Speed 3354.17 samples/sec Loss 7.6932 LearningRate 0.0805 Epoch: 2 Global Step: 25510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:57:58,670-Speed 3394.16 samples/sec Loss 7.6370 LearningRate 0.0805 Epoch: 2 Global Step: 25520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:58:01,706-Speed 3374.11 samples/sec Loss 7.7572 LearningRate 0.0805 Epoch: 2 Global Step: 25530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:58:04,738-Speed 3378.27 samples/sec Loss 7.6642 LearningRate 0.0805 Epoch: 2 Global Step: 25540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:58:07,770-Speed 3378.38 samples/sec Loss 7.7811 LearningRate 0.0805 Epoch: 2 Global Step: 25550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:58:10,773-Speed 3410.53 samples/sec Loss 7.7584 LearningRate 0.0805 Epoch: 2 Global Step: 25560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:13,843-Speed 3337.47 samples/sec Loss 7.7077 LearningRate 0.0805 Epoch: 2 Global Step: 25570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:16,899-Speed 3350.87 samples/sec Loss 7.8650 LearningRate 0.0805 Epoch: 2 Global Step: 25580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:19,925-Speed 3385.57 samples/sec Loss 7.8096 LearningRate 0.0805 Epoch: 2 Global Step: 25590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:22,985-Speed 3346.98 samples/sec Loss 7.7782 LearningRate 0.0805 Epoch: 2 Global Step: 25600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:26,049-Speed 3343.46 samples/sec Loss 7.7828 LearningRate 0.0804 Epoch: 2 Global Step: 25610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:29,086-Speed 3372.19 samples/sec Loss 7.7731 LearningRate 0.0804 Epoch: 2 Global Step: 25620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:32,135-Speed 3360.21 samples/sec Loss 7.7534 LearningRate 0.0804 Epoch: 2 Global Step: 25630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:35,179-Speed 3364.40 samples/sec Loss 7.8282 LearningRate 0.0804 Epoch: 2 Global Step: 25640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:38,244-Speed 3342.09 samples/sec Loss 7.9117 LearningRate 0.0804 Epoch: 2 Global Step: 25650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:41,283-Speed 3370.25 samples/sec Loss 7.9005 LearningRate 0.0804 Epoch: 2 Global Step: 25660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:58:44,302-Speed 3393.82 samples/sec Loss 7.8311 LearningRate 0.0804 Epoch: 2 Global Step: 25670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 03:58:47,317-Speed 3397.00 samples/sec Loss 7.7683 LearningRate 0.0804 Epoch: 2 Global Step: 25680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:50,337-Speed 3392.25 samples/sec Loss 7.8302 LearningRate 0.0804 Epoch: 2 Global Step: 25690 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:53,379-Speed 3367.04 samples/sec Loss 7.8164 LearningRate 0.0804 Epoch: 2 Global Step: 25700 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:56,430-Speed 3357.21 samples/sec Loss 7.8560 LearningRate 0.0804 Epoch: 2 Global Step: 25710 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:58:59,469-Speed 3370.54 samples/sec Loss 7.7331 LearningRate 0.0804 Epoch: 2 Global Step: 25720 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:59:02,545-Speed 3330.09 samples/sec Loss 7.9048 LearningRate 0.0804 Epoch: 2 Global Step: 25730 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:59:05,640-Speed 3309.71 samples/sec Loss 7.8234 LearningRate 0.0804 Epoch: 2 Global Step: 25740 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:59:08,676-Speed 3373.37 samples/sec Loss 7.9149 LearningRate 0.0803 Epoch: 2 Global Step: 25750 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:59:11,708-Speed 3377.93 samples/sec Loss 7.8427 LearningRate 0.0803 Epoch: 2 Global Step: 25760 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:59:14,758-Speed 3358.72 samples/sec Loss 7.8162 LearningRate 0.0803 Epoch: 2 Global Step: 25770 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:17,774-Speed 3396.03 samples/sec Loss 7.7708 LearningRate 0.0803 Epoch: 2 Global Step: 25780 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:20,826-Speed 3356.33 samples/sec Loss 7.7321 LearningRate 0.0803 Epoch: 2 Global Step: 25790 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:23,916-Speed 3315.46 samples/sec Loss 7.8335 LearningRate 0.0803 Epoch: 2 Global Step: 25800 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:26,940-Speed 3387.13 samples/sec Loss 7.8575 LearningRate 0.0803 Epoch: 2 Global Step: 25810 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:30,037-Speed 3307.31 samples/sec Loss 7.9437 LearningRate 0.0803 Epoch: 2 Global Step: 25820 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:33,063-Speed 3384.91 samples/sec Loss 7.9189 LearningRate 0.0803 Epoch: 2 Global Step: 25830 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:36,118-Speed 3353.57 samples/sec Loss 7.8904 LearningRate 0.0803 Epoch: 2 Global Step: 25840 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:39,174-Speed 3351.88 samples/sec Loss 7.8688 LearningRate 0.0803 Epoch: 2 Global Step: 25850 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:42,225-Speed 3356.68 samples/sec Loss 7.7770 LearningRate 0.0803 Epoch: 2 Global Step: 25860 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:45,255-Speed 3381.62 samples/sec Loss 7.8688 LearningRate 0.0803 Epoch: 2 Global Step: 25870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 03:59:48,279-Speed 3387.28 samples/sec Loss 7.8841 LearningRate 0.0802 Epoch: 2 Global Step: 25880 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:51,342-Speed 3343.47 samples/sec Loss 7.8660 LearningRate 0.0802 Epoch: 2 Global Step: 25890 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:54,422-Speed 3326.75 samples/sec Loss 7.8390 LearningRate 0.0802 Epoch: 2 Global Step: 25900 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 03:59:57,468-Speed 3362.56 samples/sec Loss 7.9365 LearningRate 0.0802 Epoch: 2 Global Step: 25910 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:00:00,475-Speed 3405.86 samples/sec Loss 7.8123 LearningRate 0.0802 Epoch: 2 Global Step: 25920 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:00:03,512-Speed 3372.95 samples/sec Loss 7.8851 LearningRate 0.0802 Epoch: 2 Global Step: 25930 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:00:06,581-Speed 3337.47 samples/sec Loss 7.8642 LearningRate 0.0802 Epoch: 2 Global Step: 25940 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:00:09,623-Speed 3367.50 samples/sec Loss 7.8007 LearningRate 0.0802 Epoch: 2 Global Step: 25950 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:00:12,699-Speed 3330.23 samples/sec Loss 7.9640 LearningRate 0.0802 Epoch: 2 Global Step: 25960 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:00:15,708-Speed 3403.87 samples/sec Loss 7.8573 LearningRate 0.0802 Epoch: 2 Global Step: 25970 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:00:18,755-Speed 3362.48 samples/sec Loss 7.9011 LearningRate 0.0802 Epoch: 2 Global Step: 25980 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:00:21,749-Speed 3420.58 samples/sec Loss 7.9118 LearningRate 0.0802 Epoch: 2 Global Step: 25990 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:00:24,834-Speed 3320.74 samples/sec Loss 7.9706 LearningRate 0.0802 Epoch: 2 Global Step: 26000 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:00:27,871-Speed 3372.61 samples/sec Loss 7.9549 LearningRate 0.0802 Epoch: 2 Global Step: 26010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:00:30,929-Speed 3349.87 samples/sec Loss 7.9549 LearningRate 0.0801 Epoch: 2 Global Step: 26020 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:00:33,939-Speed 3402.73 samples/sec Loss 8.0522 LearningRate 0.0801 Epoch: 2 Global Step: 26030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:00:36,990-Speed 3358.39 samples/sec Loss 8.0563 LearningRate 0.0801 Epoch: 2 Global Step: 26040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:00:39,998-Speed 3404.92 samples/sec Loss 7.8904 LearningRate 0.0801 Epoch: 2 Global Step: 26050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:00:42,995-Speed 3417.71 samples/sec Loss 7.9201 LearningRate 0.0801 Epoch: 2 Global Step: 26060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:00:46,004-Speed 3405.02 samples/sec Loss 7.9738 LearningRate 0.0801 Epoch: 2 Global Step: 26070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:00:49,094-Speed 3314.68 samples/sec Loss 7.7941 LearningRate 0.0801 Epoch: 2 Global Step: 26080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:00:52,109-Speed 3396.49 samples/sec Loss 7.9640 LearningRate 0.0801 Epoch: 2 Global Step: 26090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:00:55,126-Speed 3396.31 samples/sec Loss 7.8583 LearningRate 0.0801 Epoch: 2 Global Step: 26100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:00:58,125-Speed 3414.76 samples/sec Loss 7.8843 LearningRate 0.0801 Epoch: 2 Global Step: 26110 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:01,133-Speed 3406.18 samples/sec Loss 7.9636 LearningRate 0.0801 Epoch: 2 Global Step: 26120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:04,171-Speed 3371.22 samples/sec Loss 7.8737 LearningRate 0.0801 Epoch: 2 Global Step: 26130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:07,273-Speed 3302.10 samples/sec Loss 7.9214 LearningRate 0.0801 Epoch: 2 Global Step: 26140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:10,291-Speed 3394.83 samples/sec Loss 7.9181 LearningRate 0.0801 Epoch: 2 Global Step: 26150 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:13,300-Speed 3403.99 samples/sec Loss 7.9621 LearningRate 0.0800 Epoch: 2 Global Step: 26160 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:16,347-Speed 3361.83 samples/sec Loss 7.8728 LearningRate 0.0800 Epoch: 2 Global Step: 26170 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:19,379-Speed 3378.16 samples/sec Loss 7.9715 LearningRate 0.0800 Epoch: 2 Global Step: 26180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:22,377-Speed 3415.91 samples/sec Loss 7.9673 LearningRate 0.0800 Epoch: 2 Global Step: 26190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:25,417-Speed 3370.87 samples/sec Loss 7.9634 LearningRate 0.0800 Epoch: 2 Global Step: 26200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:28,544-Speed 3275.65 samples/sec Loss 8.0665 LearningRate 0.0800 Epoch: 2 Global Step: 26210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:01:31,574-Speed 3380.28 samples/sec Loss 8.0161 LearningRate 0.0800 Epoch: 2 Global Step: 26220 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:34,608-Speed 3376.29 samples/sec Loss 8.0576 LearningRate 0.0800 Epoch: 2 Global Step: 26230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:37,699-Speed 3313.96 samples/sec Loss 8.0426 LearningRate 0.0800 Epoch: 2 Global Step: 26240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:40,808-Speed 3294.46 samples/sec Loss 8.0973 LearningRate 0.0800 Epoch: 2 Global Step: 26250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:43,850-Speed 3367.64 samples/sec Loss 8.0238 LearningRate 0.0800 Epoch: 2 Global Step: 26260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:46,898-Speed 3359.91 samples/sec Loss 8.1084 LearningRate 0.0800 Epoch: 2 Global Step: 26270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:49,959-Speed 3346.66 samples/sec Loss 8.0490 LearningRate 0.0800 Epoch: 2 Global Step: 26280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:53,018-Speed 3348.46 samples/sec Loss 7.8953 LearningRate 0.0800 Epoch: 2 Global Step: 26290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:56,051-Speed 3377.72 samples/sec Loss 7.9324 LearningRate 0.0799 Epoch: 2 Global Step: 26300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:01:59,065-Speed 3398.12 samples/sec Loss 7.9033 LearningRate 0.0799 Epoch: 2 Global Step: 26310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:02:02,122-Speed 3351.30 samples/sec Loss 7.9574 LearningRate 0.0799 Epoch: 2 Global Step: 26320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:02:05,186-Speed 3343.42 samples/sec Loss 8.0549 LearningRate 0.0799 Epoch: 2 Global Step: 26330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:02:08,242-Speed 3351.67 samples/sec Loss 7.9715 LearningRate 0.0799 Epoch: 2 Global Step: 26340 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:02:11,293-Speed 3357.24 samples/sec Loss 8.0870 LearningRate 0.0799 Epoch: 2 Global Step: 26350 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:02:14,333-Speed 3370.16 samples/sec Loss 8.1927 LearningRate 0.0799 Epoch: 2 Global Step: 26360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:02:17,353-Speed 3391.01 samples/sec Loss 8.2050 LearningRate 0.0799 Epoch: 2 Global Step: 26370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:02:20,374-Speed 3391.39 samples/sec Loss 8.0219 LearningRate 0.0799 Epoch: 2 Global Step: 26380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:02:23,437-Speed 3344.23 samples/sec Loss 8.0157 LearningRate 0.0799 Epoch: 2 Global Step: 26390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:02:26,453-Speed 3395.19 samples/sec Loss 8.0795 LearningRate 0.0799 Epoch: 2 Global Step: 26400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:02:29,471-Speed 3394.91 samples/sec Loss 7.9810 LearningRate 0.0799 Epoch: 2 Global Step: 26410 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:02:32,488-Speed 3395.03 samples/sec Loss 8.0806 LearningRate 0.0799 Epoch: 2 Global Step: 26420 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:02:35,498-Speed 3403.00 samples/sec Loss 8.0635 LearningRate 0.0799 Epoch: 2 Global Step: 26430 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:02:38,554-Speed 3351.61 samples/sec Loss 8.0958 LearningRate 0.0798 Epoch: 2 Global Step: 26440 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:02:41,642-Speed 3317.64 samples/sec Loss 8.1554 LearningRate 0.0798 Epoch: 2 Global Step: 26450 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:02:44,676-Speed 3375.75 samples/sec Loss 8.1784 LearningRate 0.0798 Epoch: 2 Global Step: 26460 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:02:47,688-Speed 3401.27 samples/sec Loss 8.0284 LearningRate 0.0798 Epoch: 2 Global Step: 26470 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:02:50,710-Speed 3389.98 samples/sec Loss 8.1288 LearningRate 0.0798 Epoch: 2 Global Step: 26480 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:02:53,728-Speed 3393.07 samples/sec Loss 7.9987 LearningRate 0.0798 Epoch: 2 Global Step: 26490 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:02:56,773-Speed 3363.64 samples/sec Loss 8.0799 LearningRate 0.0798 Epoch: 2 Global Step: 26500 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:02:59,838-Speed 3342.16 samples/sec Loss 8.0263 LearningRate 0.0798 Epoch: 2 Global Step: 26510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:03:02,867-Speed 3381.65 samples/sec Loss 8.1630 LearningRate 0.0798 Epoch: 2 Global Step: 26520 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:03:05,906-Speed 3371.02 samples/sec Loss 8.1752 LearningRate 0.0798 Epoch: 2 Global Step: 26530 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:03:08,911-Speed 3408.37 samples/sec Loss 8.0703 LearningRate 0.0798 Epoch: 2 Global Step: 26540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:03:11,969-Speed 3350.86 samples/sec Loss 8.0604 LearningRate 0.0798 Epoch: 2 Global Step: 26550 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:03:15,029-Speed 3346.43 samples/sec Loss 8.0499 LearningRate 0.0798 Epoch: 2 Global Step: 26560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:03:18,085-Speed 3352.87 samples/sec Loss 8.0735 LearningRate 0.0798 Epoch: 2 Global Step: 26570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:03:21,072-Speed 3429.51 samples/sec Loss 8.1454 LearningRate 0.0797 Epoch: 2 Global Step: 26580 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:03:24,123-Speed 3356.68 samples/sec Loss 8.0803 LearningRate 0.0797 Epoch: 2 Global Step: 26590 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:03:27,166-Speed 3366.38 samples/sec Loss 8.1976 LearningRate 0.0797 Epoch: 2 Global Step: 26600 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:03:30,303-Speed 3265.45 samples/sec Loss 8.0523 LearningRate 0.0797 Epoch: 2 Global Step: 26610 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:03:33,330-Speed 3384.32 samples/sec Loss 8.0245 LearningRate 0.0797 Epoch: 2 Global Step: 26620 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:03:36,350-Speed 3391.89 samples/sec Loss 8.1664 LearningRate 0.0797 Epoch: 2 Global Step: 26630 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:03:39,409-Speed 3348.21 samples/sec Loss 8.0094 LearningRate 0.0797 Epoch: 2 Global Step: 26640 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:03:42,440-Speed 3380.66 samples/sec Loss 8.1076 LearningRate 0.0797 Epoch: 2 Global Step: 26650 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:03:45,452-Speed 3400.09 samples/sec Loss 7.9597 LearningRate 0.0797 Epoch: 2 Global Step: 26660 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:03:48,500-Speed 3361.46 samples/sec Loss 8.0711 LearningRate 0.0797 Epoch: 2 Global Step: 26670 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:03:51,511-Speed 3401.28 samples/sec Loss 8.2587 LearningRate 0.0797 Epoch: 2 Global Step: 26680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:03:54,555-Speed 3365.66 samples/sec Loss 8.2832 LearningRate 0.0797 Epoch: 2 Global Step: 26690 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:03:57,564-Speed 3403.91 samples/sec Loss 8.0628 LearningRate 0.0797 Epoch: 2 Global Step: 26700 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:00,594-Speed 3380.34 samples/sec Loss 8.1793 LearningRate 0.0797 Epoch: 2 Global Step: 26710 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:03,649-Speed 3353.66 samples/sec Loss 8.1481 LearningRate 0.0796 Epoch: 2 Global Step: 26720 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:06,648-Speed 3415.02 samples/sec Loss 8.1685 LearningRate 0.0796 Epoch: 2 Global Step: 26730 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:09,639-Speed 3424.65 samples/sec Loss 8.1050 LearningRate 0.0796 Epoch: 2 Global Step: 26740 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:12,667-Speed 3383.03 samples/sec Loss 8.0822 LearningRate 0.0796 Epoch: 2 Global Step: 26750 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:15,739-Speed 3334.58 samples/sec Loss 8.1434 LearningRate 0.0796 Epoch: 2 Global Step: 26760 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:18,781-Speed 3367.42 samples/sec Loss 8.0992 LearningRate 0.0796 Epoch: 2 Global Step: 26770 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:21,792-Speed 3402.37 samples/sec Loss 8.1079 LearningRate 0.0796 Epoch: 2 Global Step: 26780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:04:24,802-Speed 3402.81 samples/sec Loss 8.1020 LearningRate 0.0796 Epoch: 2 Global Step: 26790 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:27,850-Speed 3360.70 samples/sec Loss 8.0622 LearningRate 0.0796 Epoch: 2 Global Step: 26800 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:30,845-Speed 3420.27 samples/sec Loss 8.1468 LearningRate 0.0796 Epoch: 2 Global Step: 26810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:33,840-Speed 3420.00 samples/sec Loss 8.2214 LearningRate 0.0796 Epoch: 2 Global Step: 26820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:36,926-Speed 3318.79 samples/sec Loss 8.2519 LearningRate 0.0796 Epoch: 2 Global Step: 26830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:39,928-Speed 3412.64 samples/sec Loss 8.0556 LearningRate 0.0796 Epoch: 2 Global Step: 26840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:42,984-Speed 3351.22 samples/sec Loss 8.1679 LearningRate 0.0796 Epoch: 2 Global Step: 26850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:46,017-Speed 3378.47 samples/sec Loss 8.1597 LearningRate 0.0795 Epoch: 2 Global Step: 26860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:49,045-Speed 3382.30 samples/sec Loss 8.1889 LearningRate 0.0795 Epoch: 2 Global Step: 26870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:52,104-Speed 3348.87 samples/sec Loss 8.2128 LearningRate 0.0795 Epoch: 2 Global Step: 26880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:55,146-Speed 3367.34 samples/sec Loss 8.2461 LearningRate 0.0795 Epoch: 2 Global Step: 26890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:04:58,146-Speed 3413.83 samples/sec Loss 8.2398 LearningRate 0.0795 Epoch: 2 Global Step: 26900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:01,264-Speed 3285.50 samples/sec Loss 8.1190 LearningRate 0.0795 Epoch: 2 Global Step: 26910 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:04,323-Speed 3348.69 samples/sec Loss 8.2606 LearningRate 0.0795 Epoch: 2 Global Step: 26920 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:07,347-Speed 3387.26 samples/sec Loss 8.2171 LearningRate 0.0795 Epoch: 2 Global Step: 26930 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:10,408-Speed 3346.12 samples/sec Loss 8.1779 LearningRate 0.0795 Epoch: 2 Global Step: 26940 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:13,450-Speed 3367.77 samples/sec Loss 8.2080 LearningRate 0.0795 Epoch: 2 Global Step: 26950 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:16,549-Speed 3305.09 samples/sec Loss 8.0870 LearningRate 0.0795 Epoch: 2 Global Step: 26960 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:19,577-Speed 3383.71 samples/sec Loss 8.2325 LearningRate 0.0795 Epoch: 2 Global Step: 26970 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:22,602-Speed 3385.86 samples/sec Loss 8.1480 LearningRate 0.0795 Epoch: 2 Global Step: 26980 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:25,597-Speed 3420.07 samples/sec Loss 8.1564 LearningRate 0.0795 Epoch: 2 Global Step: 26990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:05:28,647-Speed 3358.12 samples/sec Loss 8.1314 LearningRate 0.0794 Epoch: 2 Global Step: 27000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:05:31,640-Speed 3422.84 samples/sec Loss 8.2442 LearningRate 0.0794 Epoch: 2 Global Step: 27010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:34,637-Speed 3418.30 samples/sec Loss 8.2845 LearningRate 0.0794 Epoch: 2 Global Step: 27020 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:37,679-Speed 3366.49 samples/sec Loss 8.1898 LearningRate 0.0794 Epoch: 2 Global Step: 27030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:40,764-Speed 3320.21 samples/sec Loss 8.2589 LearningRate 0.0794 Epoch: 2 Global Step: 27040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:43,781-Speed 3395.35 samples/sec Loss 8.1703 LearningRate 0.0794 Epoch: 2 Global Step: 27050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:46,792-Speed 3401.68 samples/sec Loss 8.1776 LearningRate 0.0794 Epoch: 2 Global Step: 27060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:49,807-Speed 3397.27 samples/sec Loss 8.2874 LearningRate 0.0794 Epoch: 2 Global Step: 27070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:52,820-Speed 3400.55 samples/sec Loss 8.1265 LearningRate 0.0794 Epoch: 2 Global Step: 27080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:55,864-Speed 3364.42 samples/sec Loss 8.1522 LearningRate 0.0794 Epoch: 2 Global Step: 27090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:05:58,880-Speed 3396.58 samples/sec Loss 8.1123 LearningRate 0.0794 Epoch: 2 Global Step: 27100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:06:01,914-Speed 3376.34 samples/sec Loss 8.2080 LearningRate 0.0794 Epoch: 2 Global Step: 27110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:06:04,938-Speed 3387.67 samples/sec Loss 8.1044 LearningRate 0.0794 Epoch: 2 Global Step: 27120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:06:07,956-Speed 3393.88 samples/sec Loss 8.1835 LearningRate 0.0794 Epoch: 2 Global Step: 27130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:06:10,956-Speed 3413.53 samples/sec Loss 8.1804 LearningRate 0.0793 Epoch: 2 Global Step: 27140 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:06:14,027-Speed 3336.02 samples/sec Loss 8.2510 LearningRate 0.0793 Epoch: 2 Global Step: 27150 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:06:17,102-Speed 3330.92 samples/sec Loss 8.2093 LearningRate 0.0793 Epoch: 2 Global Step: 27160 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:06:20,141-Speed 3370.75 samples/sec Loss 8.2036 LearningRate 0.0793 Epoch: 2 Global Step: 27170 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:06:23,204-Speed 3343.41 samples/sec Loss 8.2845 LearningRate 0.0793 Epoch: 2 Global Step: 27180 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:06:26,263-Speed 3349.02 samples/sec Loss 8.3393 LearningRate 0.0793 Epoch: 2 Global Step: 27190 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:06:29,277-Speed 3398.29 samples/sec Loss 8.1905 LearningRate 0.0793 Epoch: 2 Global Step: 27200 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:06:32,325-Speed 3360.41 samples/sec Loss 8.2490 LearningRate 0.0793 Epoch: 2 Global Step: 27210 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:06:35,380-Speed 3353.72 samples/sec Loss 8.2459 LearningRate 0.0793 Epoch: 2 Global Step: 27220 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:06:38,429-Speed 3359.51 samples/sec Loss 8.2937 LearningRate 0.0793 Epoch: 2 Global Step: 27230 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:06:41,453-Speed 3387.24 samples/sec Loss 8.3321 LearningRate 0.0793 Epoch: 2 Global Step: 27240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:06:44,476-Speed 3388.00 samples/sec Loss 8.2785 LearningRate 0.0793 Epoch: 2 Global Step: 27250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:06:47,537-Speed 3346.94 samples/sec Loss 8.1873 LearningRate 0.0793 Epoch: 2 Global Step: 27260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:06:50,638-Speed 3303.05 samples/sec Loss 8.2590 LearningRate 0.0793 Epoch: 2 Global Step: 27270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:06:53,725-Speed 3318.46 samples/sec Loss 8.3179 LearningRate 0.0792 Epoch: 2 Global Step: 27280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:06:56,793-Speed 3338.81 samples/sec Loss 8.1970 LearningRate 0.0792 Epoch: 2 Global Step: 27290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:06:59,833-Speed 3369.37 samples/sec Loss 8.3047 LearningRate 0.0792 Epoch: 2 Global Step: 27300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:02,869-Speed 3373.65 samples/sec Loss 8.1018 LearningRate 0.0792 Epoch: 2 Global Step: 27310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:05,920-Speed 3358.19 samples/sec Loss 8.1852 LearningRate 0.0792 Epoch: 2 Global Step: 27320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:08,956-Speed 3372.88 samples/sec Loss 8.1982 LearningRate 0.0792 Epoch: 2 Global Step: 27330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:11,973-Speed 3395.15 samples/sec Loss 8.3876 LearningRate 0.0792 Epoch: 2 Global Step: 27340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:07:15,061-Speed 3317.08 samples/sec Loss 8.2514 LearningRate 0.0792 Epoch: 2 Global Step: 27350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:07:18,093-Speed 3379.14 samples/sec Loss 8.3245 LearningRate 0.0792 Epoch: 2 Global Step: 27360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:07:21,120-Speed 3383.49 samples/sec Loss 8.2691 LearningRate 0.0792 Epoch: 2 Global Step: 27370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:07:24,164-Speed 3365.81 samples/sec Loss 8.3551 LearningRate 0.0792 Epoch: 2 Global Step: 27380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:07:27,194-Speed 3381.02 samples/sec Loss 8.2694 LearningRate 0.0792 Epoch: 2 Global Step: 27390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:30,286-Speed 3312.09 samples/sec Loss 8.2805 LearningRate 0.0792 Epoch: 2 Global Step: 27400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:33,329-Speed 3366.75 samples/sec Loss 8.2402 LearningRate 0.0791 Epoch: 2 Global Step: 27410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:36,355-Speed 3384.68 samples/sec Loss 8.4055 LearningRate 0.0791 Epoch: 2 Global Step: 27420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:39,441-Speed 3318.78 samples/sec Loss 8.2180 LearningRate 0.0791 Epoch: 2 Global Step: 27430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:42,510-Speed 3338.02 samples/sec Loss 8.3026 LearningRate 0.0791 Epoch: 2 Global Step: 27440 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:45,527-Speed 3395.82 samples/sec Loss 8.3944 LearningRate 0.0791 Epoch: 2 Global Step: 27450 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:48,634-Speed 3295.77 samples/sec Loss 8.2301 LearningRate 0.0791 Epoch: 2 Global Step: 27460 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:07:51,718-Speed 3322.49 samples/sec Loss 8.2024 LearningRate 0.0791 Epoch: 2 Global Step: 27470 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:07:54,826-Speed 3295.51 samples/sec Loss 8.3236 LearningRate 0.0791 Epoch: 2 Global Step: 27480 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:07:57,890-Speed 3343.38 samples/sec Loss 8.3555 LearningRate 0.0791 Epoch: 2 Global Step: 27490 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:08:00,917-Speed 3384.56 samples/sec Loss 8.2507 LearningRate 0.0791 Epoch: 2 Global Step: 27500 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:08:03,948-Speed 3378.59 samples/sec Loss 8.3349 LearningRate 0.0791 Epoch: 2 Global Step: 27510 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:08:06,964-Speed 3396.31 samples/sec Loss 8.3303 LearningRate 0.0791 Epoch: 2 Global Step: 27520 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:08:09,961-Speed 3418.06 samples/sec Loss 8.3404 LearningRate 0.0791 Epoch: 2 Global Step: 27530 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:08:13,038-Speed 3329.54 samples/sec Loss 8.3292 LearningRate 0.0791 Epoch: 2 Global Step: 27540 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:08:16,143-Speed 3299.15 samples/sec Loss 8.3577 LearningRate 0.0790 Epoch: 2 Global Step: 27550 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:08:19,218-Speed 3330.38 samples/sec Loss 8.3923 LearningRate 0.0790 Epoch: 2 Global Step: 27560 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:08:22,238-Speed 3392.14 samples/sec Loss 8.3936 LearningRate 0.0790 Epoch: 2 Global Step: 27570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:08:25,266-Speed 3383.05 samples/sec Loss 8.1948 LearningRate 0.0790 Epoch: 2 Global Step: 27580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:08:28,285-Speed 3392.53 samples/sec Loss 8.2478 LearningRate 0.0790 Epoch: 2 Global Step: 27590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:08:31,370-Speed 3321.23 samples/sec Loss 8.2998 LearningRate 0.0790 Epoch: 2 Global Step: 27600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:08:34,391-Speed 3389.89 samples/sec Loss 8.3160 LearningRate 0.0790 Epoch: 2 Global Step: 27610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:08:37,486-Speed 3309.44 samples/sec Loss 8.2315 LearningRate 0.0790 Epoch: 2 Global Step: 27620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:08:40,584-Speed 3306.44 samples/sec Loss 8.3535 LearningRate 0.0790 Epoch: 2 Global Step: 27630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:08:43,618-Speed 3377.05 samples/sec Loss 8.3849 LearningRate 0.0790 Epoch: 2 Global Step: 27640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:08:46,644-Speed 3384.90 samples/sec Loss 8.2537 LearningRate 0.0790 Epoch: 2 Global Step: 27650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:08:49,656-Speed 3400.74 samples/sec Loss 8.3132 LearningRate 0.0790 Epoch: 2 Global Step: 27660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:08:52,686-Speed 3380.32 samples/sec Loss 8.2873 LearningRate 0.0790 Epoch: 2 Global Step: 27670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:08:55,753-Speed 3340.58 samples/sec Loss 8.2645 LearningRate 0.0790 Epoch: 2 Global Step: 27680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:08:58,769-Speed 3396.60 samples/sec Loss 8.4680 LearningRate 0.0789 Epoch: 2 Global Step: 27690 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:09:01,806-Speed 3372.04 samples/sec Loss 8.3028 LearningRate 0.0789 Epoch: 2 Global Step: 27700 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:09:04,796-Speed 3426.98 samples/sec Loss 8.3551 LearningRate 0.0789 Epoch: 2 Global Step: 27710 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:09:07,821-Speed 3385.40 samples/sec Loss 8.3462 LearningRate 0.0789 Epoch: 2 Global Step: 27720 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:09:10,892-Speed 3335.57 samples/sec Loss 8.3496 LearningRate 0.0789 Epoch: 2 Global Step: 27730 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:09:13,984-Speed 3313.31 samples/sec Loss 8.2407 LearningRate 0.0789 Epoch: 2 Global Step: 27740 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:09:17,034-Speed 3358.34 samples/sec Loss 8.3069 LearningRate 0.0789 Epoch: 2 Global Step: 27750 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:09:20,042-Speed 3405.37 samples/sec Loss 8.1602 LearningRate 0.0789 Epoch: 2 Global Step: 27760 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:09:23,078-Speed 3373.67 samples/sec Loss 8.1835 LearningRate 0.0789 Epoch: 2 Global Step: 27770 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:09:26,090-Speed 3400.05 samples/sec Loss 8.4769 LearningRate 0.0789 Epoch: 2 Global Step: 27780 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:09:29,094-Speed 3410.86 samples/sec Loss 8.2016 LearningRate 0.0789 Epoch: 2 Global Step: 27790 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:09:32,111-Speed 3395.00 samples/sec Loss 8.2474 LearningRate 0.0789 Epoch: 2 Global Step: 27800 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:09:35,110-Speed 3414.83 samples/sec Loss 8.4018 LearningRate 0.0789 Epoch: 2 Global Step: 27810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:09:38,149-Speed 3370.81 samples/sec Loss 8.2479 LearningRate 0.0789 Epoch: 2 Global Step: 27820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:09:41,180-Speed 3380.38 samples/sec Loss 8.2774 LearningRate 0.0788 Epoch: 2 Global Step: 27830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:09:44,208-Speed 3381.78 samples/sec Loss 8.3148 LearningRate 0.0788 Epoch: 2 Global Step: 27840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:09:47,264-Speed 3352.18 samples/sec Loss 8.2189 LearningRate 0.0788 Epoch: 2 Global Step: 27850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:09:50,321-Speed 3350.15 samples/sec Loss 8.4045 LearningRate 0.0788 Epoch: 2 Global Step: 27860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:09:53,401-Speed 3326.05 samples/sec Loss 8.3548 LearningRate 0.0788 Epoch: 2 Global Step: 27870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:09:56,451-Speed 3358.21 samples/sec Loss 8.3282 LearningRate 0.0788 Epoch: 2 Global Step: 27880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:09:59,444-Speed 3423.15 samples/sec Loss 8.2039 LearningRate 0.0788 Epoch: 2 Global Step: 27890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:02,478-Speed 3375.60 samples/sec Loss 8.3409 LearningRate 0.0788 Epoch: 2 Global Step: 27900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:05,558-Speed 3325.87 samples/sec Loss 8.3429 LearningRate 0.0788 Epoch: 2 Global Step: 27910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:10:08,565-Speed 3405.92 samples/sec Loss 8.3428 LearningRate 0.0788 Epoch: 2 Global Step: 27920 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:11,592-Speed 3384.15 samples/sec Loss 8.3640 LearningRate 0.0788 Epoch: 2 Global Step: 27930 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:14,655-Speed 3344.29 samples/sec Loss 8.3462 LearningRate 0.0788 Epoch: 2 Global Step: 27940 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:17,720-Speed 3341.68 samples/sec Loss 8.2155 LearningRate 0.0788 Epoch: 2 Global Step: 27950 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:20,797-Speed 3329.33 samples/sec Loss 8.2817 LearningRate 0.0788 Epoch: 2 Global Step: 27960 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:23,853-Speed 3352.05 samples/sec Loss 8.2512 LearningRate 0.0787 Epoch: 2 Global Step: 27970 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:26,899-Speed 3362.18 samples/sec Loss 8.3786 LearningRate 0.0787 Epoch: 2 Global Step: 27980 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:30,022-Speed 3280.55 samples/sec Loss 8.3030 LearningRate 0.0787 Epoch: 2 Global Step: 27990 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:33,019-Speed 3417.05 samples/sec Loss 8.3687 LearningRate 0.0787 Epoch: 2 Global Step: 28000 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:36,042-Speed 3388.69 samples/sec Loss 8.2956 LearningRate 0.0787 Epoch: 2 Global Step: 28010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:39,093-Speed 3357.01 samples/sec Loss 8.2535 LearningRate 0.0787 Epoch: 2 Global Step: 28020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:10:42,190-Speed 3307.50 samples/sec Loss 8.4433 LearningRate 0.0787 Epoch: 2 Global Step: 28030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:10:45,216-Speed 3384.93 samples/sec Loss 8.3820 LearningRate 0.0787 Epoch: 2 Global Step: 28040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:10:48,214-Speed 3417.00 samples/sec Loss 8.3187 LearningRate 0.0787 Epoch: 2 Global Step: 28050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:51,221-Speed 3405.92 samples/sec Loss 8.3980 LearningRate 0.0787 Epoch: 2 Global Step: 28060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:54,322-Speed 3303.55 samples/sec Loss 8.2986 LearningRate 0.0787 Epoch: 2 Global Step: 28070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:10:57,352-Speed 3380.88 samples/sec Loss 8.2715 LearningRate 0.0787 Epoch: 2 Global Step: 28080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:00,433-Speed 3324.38 samples/sec Loss 8.3173 LearningRate 0.0787 Epoch: 2 Global Step: 28090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:03,486-Speed 3355.15 samples/sec Loss 8.3577 LearningRate 0.0787 Epoch: 2 Global Step: 28100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:06,614-Speed 3274.72 samples/sec Loss 8.3179 LearningRate 0.0786 Epoch: 2 Global Step: 28110 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:09,605-Speed 3424.11 samples/sec Loss 8.3943 LearningRate 0.0786 Epoch: 2 Global Step: 28120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:12,686-Speed 3325.21 samples/sec Loss 8.3479 LearningRate 0.0786 Epoch: 2 Global Step: 28130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:15,725-Speed 3371.02 samples/sec Loss 8.2338 LearningRate 0.0786 Epoch: 2 Global Step: 28140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:18,740-Speed 3397.30 samples/sec Loss 8.2789 LearningRate 0.0786 Epoch: 2 Global Step: 28150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:11:21,759-Speed 3392.86 samples/sec Loss 8.3044 LearningRate 0.0786 Epoch: 2 Global Step: 28160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:11:24,828-Speed 3336.64 samples/sec Loss 8.3168 LearningRate 0.0786 Epoch: 2 Global Step: 28170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:11:27,909-Speed 3325.06 samples/sec Loss 8.3394 LearningRate 0.0786 Epoch: 2 Global Step: 28180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:11:30,949-Speed 3369.77 samples/sec Loss 8.2304 LearningRate 0.0786 Epoch: 2 Global Step: 28190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:11:33,956-Speed 3405.93 samples/sec Loss 8.4824 LearningRate 0.0786 Epoch: 2 Global Step: 28200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:11:36,956-Speed 3414.71 samples/sec Loss 8.4109 LearningRate 0.0786 Epoch: 2 Global Step: 28210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:39,986-Speed 3380.85 samples/sec Loss 8.4441 LearningRate 0.0786 Epoch: 2 Global Step: 28220 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:43,024-Speed 3371.65 samples/sec Loss 8.3600 LearningRate 0.0786 Epoch: 2 Global Step: 28230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:46,052-Speed 3382.16 samples/sec Loss 8.2822 LearningRate 0.0786 Epoch: 2 Global Step: 28240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:49,103-Speed 3357.31 samples/sec Loss 8.2439 LearningRate 0.0785 Epoch: 2 Global Step: 28250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:52,159-Speed 3352.70 samples/sec Loss 8.3291 LearningRate 0.0785 Epoch: 2 Global Step: 28260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:55,213-Speed 3353.39 samples/sec Loss 8.4451 LearningRate 0.0785 Epoch: 2 Global Step: 28270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:11:58,239-Speed 3384.91 samples/sec Loss 8.4497 LearningRate 0.0785 Epoch: 2 Global Step: 28280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:01,287-Speed 3361.10 samples/sec Loss 8.3246 LearningRate 0.0785 Epoch: 2 Global Step: 28290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:04,312-Speed 3385.33 samples/sec Loss 8.3870 LearningRate 0.0785 Epoch: 2 Global Step: 28300 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:07,341-Speed 3382.16 samples/sec Loss 8.3629 LearningRate 0.0785 Epoch: 2 Global Step: 28310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:12:10,351-Speed 3403.14 samples/sec Loss 8.5255 LearningRate 0.0785 Epoch: 2 Global Step: 28320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:12:13,379-Speed 3382.34 samples/sec Loss 8.3513 LearningRate 0.0785 Epoch: 2 Global Step: 28330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:16,445-Speed 3341.77 samples/sec Loss 8.4725 LearningRate 0.0785 Epoch: 2 Global Step: 28340 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:19,512-Speed 3338.91 samples/sec Loss 8.3987 LearningRate 0.0785 Epoch: 2 Global Step: 28350 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:22,541-Speed 3381.93 samples/sec Loss 8.3899 LearningRate 0.0785 Epoch: 2 Global Step: 28360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:25,556-Speed 3398.06 samples/sec Loss 8.3603 LearningRate 0.0785 Epoch: 2 Global Step: 28370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:28,612-Speed 3351.80 samples/sec Loss 8.2480 LearningRate 0.0785 Epoch: 2 Global Step: 28380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:31,669-Speed 3350.78 samples/sec Loss 8.3036 LearningRate 0.0784 Epoch: 2 Global Step: 28390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:34,689-Speed 3391.93 samples/sec Loss 8.3768 LearningRate 0.0784 Epoch: 2 Global Step: 28400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:37,703-Speed 3398.66 samples/sec Loss 8.3337 LearningRate 0.0784 Epoch: 2 Global Step: 28410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:40,745-Speed 3366.54 samples/sec Loss 8.2542 LearningRate 0.0784 Epoch: 2 Global Step: 28420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:12:43,762-Speed 3395.21 samples/sec Loss 8.3832 LearningRate 0.0784 Epoch: 2 Global Step: 28430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:12:46,809-Speed 3361.80 samples/sec Loss 8.5299 LearningRate 0.0784 Epoch: 2 Global Step: 28440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:12:49,890-Speed 3324.40 samples/sec Loss 8.3150 LearningRate 0.0784 Epoch: 2 Global Step: 28450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:12:52,943-Speed 3355.75 samples/sec Loss 8.3183 LearningRate 0.0784 Epoch: 2 Global Step: 28460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:12:55,958-Speed 3397.73 samples/sec Loss 8.3402 LearningRate 0.0784 Epoch: 2 Global Step: 28470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:12:58,977-Speed 3392.14 samples/sec Loss 8.3324 LearningRate 0.0784 Epoch: 2 Global Step: 28480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:13:02,057-Speed 3326.09 samples/sec Loss 8.2944 LearningRate 0.0784 Epoch: 2 Global Step: 28490 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:13:05,095-Speed 3371.97 samples/sec Loss 8.4748 LearningRate 0.0784 Epoch: 2 Global Step: 28500 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:13:08,114-Speed 3392.51 samples/sec Loss 8.3480 LearningRate 0.0784 Epoch: 2 Global Step: 28510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:13:11,158-Speed 3365.80 samples/sec Loss 8.4724 LearningRate 0.0784 Epoch: 2 Global Step: 28520 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:13:14,234-Speed 3329.75 samples/sec Loss 8.4937 LearningRate 0.0783 Epoch: 2 Global Step: 28530 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:13:17,314-Speed 3325.82 samples/sec Loss 8.3653 LearningRate 0.0783 Epoch: 2 Global Step: 28540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:13:20,356-Speed 3366.70 samples/sec Loss 8.2732 LearningRate 0.0783 Epoch: 2 Global Step: 28550 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:13:23,375-Speed 3392.95 samples/sec Loss 8.3427 LearningRate 0.0783 Epoch: 2 Global Step: 28560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:13:26,383-Speed 3406.31 samples/sec Loss 8.4411 LearningRate 0.0783 Epoch: 2 Global Step: 28570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:13:29,389-Speed 3406.67 samples/sec Loss 8.3953 LearningRate 0.0783 Epoch: 2 Global Step: 28580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:13:32,386-Speed 3418.29 samples/sec Loss 8.3418 LearningRate 0.0783 Epoch: 2 Global Step: 28590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:13:35,412-Speed 3385.09 samples/sec Loss 8.4312 LearningRate 0.0783 Epoch: 2 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:13:38,520-Speed 3295.31 samples/sec Loss 8.4879 LearningRate 0.0783 Epoch: 2 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:13:41,583-Speed 3344.52 samples/sec Loss 8.2991 LearningRate 0.0783 Epoch: 2 Global Step: 28620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:13:44,594-Speed 3401.56 samples/sec Loss 8.3812 LearningRate 0.0783 Epoch: 2 Global Step: 28630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:13:47,643-Speed 3359.62 samples/sec Loss 8.3685 LearningRate 0.0783 Epoch: 2 Global Step: 28640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:13:50,694-Speed 3357.40 samples/sec Loss 8.3439 LearningRate 0.0783 Epoch: 2 Global Step: 28650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:13:53,766-Speed 3334.52 samples/sec Loss 8.4163 LearningRate 0.0783 Epoch: 2 Global Step: 28660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:13:56,763-Speed 3418.26 samples/sec Loss 8.3767 LearningRate 0.0783 Epoch: 2 Global Step: 28670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:13:59,762-Speed 3415.16 samples/sec Loss 8.3698 LearningRate 0.0782 Epoch: 2 Global Step: 28680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:02,765-Speed 3410.71 samples/sec Loss 8.2274 LearningRate 0.0782 Epoch: 2 Global Step: 28690 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:05,807-Speed 3367.77 samples/sec Loss 8.4007 LearningRate 0.0782 Epoch: 2 Global Step: 28700 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:08,799-Speed 3423.99 samples/sec Loss 8.3577 LearningRate 0.0782 Epoch: 2 Global Step: 28710 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:11,878-Speed 3326.22 samples/sec Loss 8.2448 LearningRate 0.0782 Epoch: 2 Global Step: 28720 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:14,959-Speed 3324.85 samples/sec Loss 8.4621 LearningRate 0.0782 Epoch: 2 Global Step: 28730 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:17,976-Speed 3394.47 samples/sec Loss 8.4714 LearningRate 0.0782 Epoch: 2 Global Step: 28740 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:20,990-Speed 3398.91 samples/sec Loss 8.3998 LearningRate 0.0782 Epoch: 2 Global Step: 28750 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:24,006-Speed 3396.31 samples/sec Loss 8.3298 LearningRate 0.0782 Epoch: 2 Global Step: 28760 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:27,002-Speed 3418.82 samples/sec Loss 8.4942 LearningRate 0.0782 Epoch: 2 Global Step: 28770 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:30,093-Speed 3313.68 samples/sec Loss 8.4688 LearningRate 0.0782 Epoch: 2 Global Step: 28780 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:33,133-Speed 3369.66 samples/sec Loss 8.5069 LearningRate 0.0782 Epoch: 2 Global Step: 28790 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:36,170-Speed 3373.77 samples/sec Loss 8.3842 LearningRate 0.0782 Epoch: 2 Global Step: 28800 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:39,209-Speed 3369.73 samples/sec Loss 8.3533 LearningRate 0.0782 Epoch: 2 Global Step: 28810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:42,321-Speed 3292.16 samples/sec Loss 8.3841 LearningRate 0.0781 Epoch: 2 Global Step: 28820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:45,356-Speed 3374.68 samples/sec Loss 8.3939 LearningRate 0.0781 Epoch: 2 Global Step: 28830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:48,461-Speed 3299.53 samples/sec Loss 8.4377 LearningRate 0.0781 Epoch: 2 Global Step: 28840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:51,502-Speed 3367.91 samples/sec Loss 8.4262 LearningRate 0.0781 Epoch: 2 Global Step: 28850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:54,513-Speed 3402.90 samples/sec Loss 8.3815 LearningRate 0.0781 Epoch: 2 Global Step: 28860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:14:57,549-Speed 3373.39 samples/sec Loss 8.4595 LearningRate 0.0781 Epoch: 2 Global Step: 28870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:15:00,562-Speed 3399.26 samples/sec Loss 8.3679 LearningRate 0.0781 Epoch: 2 Global Step: 28880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:15:03,598-Speed 3374.95 samples/sec Loss 8.3173 LearningRate 0.0781 Epoch: 2 Global Step: 28890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:15:06,623-Speed 3385.65 samples/sec Loss 8.3582 LearningRate 0.0781 Epoch: 2 Global Step: 28900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:15:09,633-Speed 3403.20 samples/sec Loss 8.3072 LearningRate 0.0781 Epoch: 2 Global Step: 28910 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:15:12,718-Speed 3320.32 samples/sec Loss 8.4800 LearningRate 0.0781 Epoch: 2 Global Step: 28920 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:15:15,783-Speed 3342.74 samples/sec Loss 8.4364 LearningRate 0.0781 Epoch: 2 Global Step: 28930 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:15:18,815-Speed 3378.12 samples/sec Loss 8.4733 LearningRate 0.0781 Epoch: 2 Global Step: 28940 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:15:21,803-Speed 3428.07 samples/sec Loss 8.2661 LearningRate 0.0781 Epoch: 2 Global Step: 28950 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:15:24,846-Speed 3366.57 samples/sec Loss 8.4546 LearningRate 0.0780 Epoch: 2 Global Step: 28960 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:15:27,861-Speed 3397.51 samples/sec Loss 8.3380 LearningRate 0.0780 Epoch: 2 Global Step: 28970 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:15:30,954-Speed 3310.85 samples/sec Loss 8.4865 LearningRate 0.0780 Epoch: 2 Global Step: 28980 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:15:34,009-Speed 3353.78 samples/sec Loss 8.4682 LearningRate 0.0780 Epoch: 2 Global Step: 28990 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:15:37,130-Speed 3282.08 samples/sec Loss 8.4195 LearningRate 0.0780 Epoch: 2 Global Step: 29000 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:15:40,160-Speed 3380.49 samples/sec Loss 8.4303 LearningRate 0.0780 Epoch: 2 Global Step: 29010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:15:43,209-Speed 3359.50 samples/sec Loss 8.4834 LearningRate 0.0780 Epoch: 2 Global Step: 29020 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:15:46,236-Speed 3383.32 samples/sec Loss 8.4649 LearningRate 0.0780 Epoch: 2 Global Step: 29030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:15:49,267-Speed 3379.80 samples/sec Loss 8.4813 LearningRate 0.0780 Epoch: 2 Global Step: 29040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:15:52,269-Speed 3412.55 samples/sec Loss 8.4114 LearningRate 0.0780 Epoch: 2 Global Step: 29050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:15:55,303-Speed 3375.79 samples/sec Loss 8.4263 LearningRate 0.0780 Epoch: 2 Global Step: 29060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:15:58,325-Speed 3389.54 samples/sec Loss 8.2878 LearningRate 0.0780 Epoch: 2 Global Step: 29070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:01,380-Speed 3352.47 samples/sec Loss 8.3969 LearningRate 0.0780 Epoch: 2 Global Step: 29080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:04,444-Speed 3343.56 samples/sec Loss 8.3365 LearningRate 0.0780 Epoch: 2 Global Step: 29090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:07,540-Speed 3308.38 samples/sec Loss 8.4850 LearningRate 0.0779 Epoch: 2 Global Step: 29100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:10,552-Speed 3400.70 samples/sec Loss 8.4048 LearningRate 0.0779 Epoch: 2 Global Step: 29110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:16:13,587-Speed 3374.64 samples/sec Loss 8.3541 LearningRate 0.0779 Epoch: 2 Global Step: 29120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:16:16,607-Speed 3392.52 samples/sec Loss 8.3597 LearningRate 0.0779 Epoch: 2 Global Step: 29130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:19,685-Speed 3327.64 samples/sec Loss 8.4108 LearningRate 0.0779 Epoch: 2 Global Step: 29140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:22,700-Speed 3397.32 samples/sec Loss 8.4733 LearningRate 0.0779 Epoch: 2 Global Step: 29150 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:25,746-Speed 3363.22 samples/sec Loss 8.3276 LearningRate 0.0779 Epoch: 2 Global Step: 29160 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:28,754-Speed 3405.58 samples/sec Loss 8.5211 LearningRate 0.0779 Epoch: 2 Global Step: 29170 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:31,817-Speed 3344.32 samples/sec Loss 8.3719 LearningRate 0.0779 Epoch: 2 Global Step: 29180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:34,833-Speed 3395.55 samples/sec Loss 8.5845 LearningRate 0.0779 Epoch: 2 Global Step: 29190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:37,926-Speed 3312.48 samples/sec Loss 8.4057 LearningRate 0.0779 Epoch: 2 Global Step: 29200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:40,944-Speed 3393.33 samples/sec Loss 8.5418 LearningRate 0.0779 Epoch: 2 Global Step: 29210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:43,952-Speed 3405.36 samples/sec Loss 8.4220 LearningRate 0.0779 Epoch: 2 Global Step: 29220 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:16:46,972-Speed 3392.76 samples/sec Loss 8.4693 LearningRate 0.0779 Epoch: 2 Global Step: 29230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:16:50,097-Speed 3277.27 samples/sec Loss 8.3287 LearningRate 0.0778 Epoch: 2 Global Step: 29240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:16:53,129-Speed 3378.88 samples/sec Loss 8.4219 LearningRate 0.0778 Epoch: 2 Global Step: 29250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:16:56,158-Speed 3381.95 samples/sec Loss 8.4972 LearningRate 0.0778 Epoch: 2 Global Step: 29260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:16:59,162-Speed 3409.20 samples/sec Loss 8.4668 LearningRate 0.0778 Epoch: 2 Global Step: 29270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:17:02,216-Speed 3353.99 samples/sec Loss 8.3807 LearningRate 0.0778 Epoch: 2 Global Step: 29280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:17:05,300-Speed 3321.25 samples/sec Loss 8.3931 LearningRate 0.0778 Epoch: 2 Global Step: 29290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:17:08,333-Speed 3377.62 samples/sec Loss 8.3932 LearningRate 0.0778 Epoch: 2 Global Step: 29300 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:11,361-Speed 3382.61 samples/sec Loss 8.5489 LearningRate 0.0778 Epoch: 2 Global Step: 29310 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:14,416-Speed 3353.21 samples/sec Loss 8.3958 LearningRate 0.0778 Epoch: 2 Global Step: 29320 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:17,483-Speed 3339.08 samples/sec Loss 8.4088 LearningRate 0.0778 Epoch: 2 Global Step: 29330 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:20,533-Speed 3359.33 samples/sec Loss 8.3734 LearningRate 0.0778 Epoch: 2 Global Step: 29340 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:23,576-Speed 3366.02 samples/sec Loss 8.3970 LearningRate 0.0778 Epoch: 2 Global Step: 29350 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:26,617-Speed 3367.44 samples/sec Loss 8.3871 LearningRate 0.0778 Epoch: 2 Global Step: 29360 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:29,635-Speed 3394.55 samples/sec Loss 8.4792 LearningRate 0.0778 Epoch: 2 Global Step: 29370 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:32,638-Speed 3410.34 samples/sec Loss 8.3580 LearningRate 0.0777 Epoch: 2 Global Step: 29380 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:35,655-Speed 3395.52 samples/sec Loss 8.4120 LearningRate 0.0777 Epoch: 2 Global Step: 29390 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:38,770-Speed 3288.89 samples/sec Loss 8.4355 LearningRate 0.0777 Epoch: 2 Global Step: 29400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:17:41,814-Speed 3364.84 samples/sec Loss 8.4549 LearningRate 0.0777 Epoch: 2 Global Step: 29410 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:44,809-Speed 3420.30 samples/sec Loss 8.4332 LearningRate 0.0777 Epoch: 2 Global Step: 29420 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:47,861-Speed 3355.69 samples/sec Loss 8.3765 LearningRate 0.0777 Epoch: 2 Global Step: 29430 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:50,922-Speed 3347.01 samples/sec Loss 8.5056 LearningRate 0.0777 Epoch: 2 Global Step: 29440 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:54,018-Speed 3309.04 samples/sec Loss 8.3388 LearningRate 0.0777 Epoch: 2 Global Step: 29450 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:17:57,083-Speed 3341.76 samples/sec Loss 8.5000 LearningRate 0.0777 Epoch: 2 Global Step: 29460 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:18:00,092-Speed 3403.13 samples/sec Loss 8.5296 LearningRate 0.0777 Epoch: 2 Global Step: 29470 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:18:03,189-Speed 3307.35 samples/sec Loss 8.3458 LearningRate 0.0777 Epoch: 2 Global Step: 29480 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:18:06,243-Speed 3355.20 samples/sec Loss 8.3382 LearningRate 0.0777 Epoch: 2 Global Step: 29490 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:18:09,238-Speed 3420.00 samples/sec Loss 8.3297 LearningRate 0.0777 Epoch: 2 Global Step: 29500 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:18:12,266-Speed 3382.41 samples/sec Loss 8.4092 LearningRate 0.0777 Epoch: 2 Global Step: 29510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:15,295-Speed 3381.35 samples/sec Loss 8.4009 LearningRate 0.0776 Epoch: 2 Global Step: 29520 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:18,423-Speed 3274.82 samples/sec Loss 8.3303 LearningRate 0.0776 Epoch: 2 Global Step: 29530 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:21,459-Speed 3374.22 samples/sec Loss 8.4609 LearningRate 0.0776 Epoch: 2 Global Step: 29540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:24,529-Speed 3336.62 samples/sec Loss 8.3352 LearningRate 0.0776 Epoch: 2 Global Step: 29550 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:27,562-Speed 3376.70 samples/sec Loss 8.4794 LearningRate 0.0776 Epoch: 2 Global Step: 29560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:30,644-Speed 3323.88 samples/sec Loss 8.4715 LearningRate 0.0776 Epoch: 2 Global Step: 29570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:33,682-Speed 3372.06 samples/sec Loss 8.5037 LearningRate 0.0776 Epoch: 2 Global Step: 29580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:36,738-Speed 3351.77 samples/sec Loss 8.3840 LearningRate 0.0776 Epoch: 2 Global Step: 29590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:39,746-Speed 3404.42 samples/sec Loss 8.4168 LearningRate 0.0776 Epoch: 2 Global Step: 29600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:42,777-Speed 3380.05 samples/sec Loss 8.4139 LearningRate 0.0776 Epoch: 2 Global Step: 29610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:18:45,793-Speed 3395.85 samples/sec Loss 8.4705 LearningRate 0.0776 Epoch: 2 Global Step: 29620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:18:48,798-Speed 3409.16 samples/sec Loss 8.4865 LearningRate 0.0776 Epoch: 2 Global Step: 29630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:18:51,843-Speed 3364.32 samples/sec Loss 8.3468 LearningRate 0.0776 Epoch: 2 Global Step: 29640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:54,876-Speed 3376.55 samples/sec Loss 8.3764 LearningRate 0.0776 Epoch: 2 Global Step: 29650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:18:57,869-Speed 3422.78 samples/sec Loss 8.4889 LearningRate 0.0775 Epoch: 2 Global Step: 29660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:19:00,919-Speed 3358.71 samples/sec Loss 8.5087 LearningRate 0.0775 Epoch: 2 Global Step: 29670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:19:03,954-Speed 3374.55 samples/sec Loss 8.4288 LearningRate 0.0775 Epoch: 2 Global Step: 29680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:19:06,956-Speed 3413.04 samples/sec Loss 8.3953 LearningRate 0.0775 Epoch: 2 Global Step: 29690 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:19:09,957-Speed 3412.26 samples/sec Loss 8.4378 LearningRate 0.0775 Epoch: 2 Global Step: 29700 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:19:13,002-Speed 3364.32 samples/sec Loss 8.3958 LearningRate 0.0775 Epoch: 2 Global Step: 29710 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:19:16,090-Speed 3317.17 samples/sec Loss 8.4687 LearningRate 0.0775 Epoch: 2 Global Step: 29720 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:19:19,143-Speed 3355.73 samples/sec Loss 8.4147 LearningRate 0.0775 Epoch: 2 Global Step: 29730 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:19:22,157-Speed 3398.87 samples/sec Loss 8.3758 LearningRate 0.0775 Epoch: 2 Global Step: 29740 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:19:25,211-Speed 3353.52 samples/sec Loss 8.4006 LearningRate 0.0775 Epoch: 2 Global Step: 29750 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:19:28,229-Speed 3394.17 samples/sec Loss 8.2725 LearningRate 0.0775 Epoch: 2 Global Step: 29760 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:19:31,327-Speed 3306.54 samples/sec Loss 8.4639 LearningRate 0.0775 Epoch: 2 Global Step: 29770 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:19:34,400-Speed 3332.98 samples/sec Loss 8.3900 LearningRate 0.0775 Epoch: 2 Global Step: 29780 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:19:37,414-Speed 3399.10 samples/sec Loss 8.4298 LearningRate 0.0775 Epoch: 2 Global Step: 29790 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:19:40,431-Speed 3395.22 samples/sec Loss 8.4025 LearningRate 0.0774 Epoch: 2 Global Step: 29800 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:19:43,448-Speed 3395.08 samples/sec Loss 8.3904 LearningRate 0.0774 Epoch: 2 Global Step: 29810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:19:46,451-Speed 3411.32 samples/sec Loss 8.5249 LearningRate 0.0774 Epoch: 2 Global Step: 29820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:19:49,502-Speed 3356.64 samples/sec Loss 8.3461 LearningRate 0.0774 Epoch: 2 Global Step: 29830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:19:52,526-Speed 3387.21 samples/sec Loss 8.4746 LearningRate 0.0774 Epoch: 2 Global Step: 29840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:19:55,585-Speed 3349.61 samples/sec Loss 8.4294 LearningRate 0.0774 Epoch: 2 Global Step: 29850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:19:58,582-Speed 3417.13 samples/sec Loss 8.2852 LearningRate 0.0774 Epoch: 2 Global Step: 29860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:20:01,605-Speed 3388.26 samples/sec Loss 8.4171 LearningRate 0.0774 Epoch: 2 Global Step: 29870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:20:04,675-Speed 3336.81 samples/sec Loss 8.3928 LearningRate 0.0774 Epoch: 2 Global Step: 29880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:20:07,727-Speed 3355.63 samples/sec Loss 8.4798 LearningRate 0.0774 Epoch: 2 Global Step: 29890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:20:10,769-Speed 3367.31 samples/sec Loss 8.3747 LearningRate 0.0774 Epoch: 2 Global Step: 29900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:20:13,797-Speed 3383.76 samples/sec Loss 8.4823 LearningRate 0.0774 Epoch: 2 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:20:16,844-Speed 3361.52 samples/sec Loss 8.3366 LearningRate 0.0774 Epoch: 2 Global Step: 29920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:20:19,868-Speed 3387.50 samples/sec Loss 8.4397 LearningRate 0.0774 Epoch: 2 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:20:22,917-Speed 3360.00 samples/sec Loss 8.2529 LearningRate 0.0773 Epoch: 2 Global Step: 29940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:20:25,931-Speed 3398.35 samples/sec Loss 8.4465 LearningRate 0.0773 Epoch: 2 Global Step: 29950 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:20:28,963-Speed 3378.91 samples/sec Loss 8.2251 LearningRate 0.0773 Epoch: 2 Global Step: 29960 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:20:32,004-Speed 3368.61 samples/sec Loss 8.3968 LearningRate 0.0773 Epoch: 2 Global Step: 29970 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:20:35,049-Speed 3364.09 samples/sec Loss 8.4022 LearningRate 0.0773 Epoch: 2 Global Step: 29980 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:20:38,119-Speed 3336.32 samples/sec Loss 8.5302 LearningRate 0.0773 Epoch: 2 Global Step: 29990 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:20:41,147-Speed 3382.42 samples/sec Loss 8.4842 LearningRate 0.0773 Epoch: 2 Global Step: 30000 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:20:44,189-Speed 3367.57 samples/sec Loss 8.5327 LearningRate 0.0773 Epoch: 2 Global Step: 30010 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:20:47,197-Speed 3405.58 samples/sec Loss 8.4237 LearningRate 0.0773 Epoch: 2 Global Step: 30020 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:20:50,237-Speed 3369.54 samples/sec Loss 8.3699 LearningRate 0.0773 Epoch: 2 Global Step: 30030 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:20:53,289-Speed 3356.36 samples/sec Loss 8.4548 LearningRate 0.0773 Epoch: 2 Global Step: 30040 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:20:56,294-Speed 3408.32 samples/sec Loss 8.4221 LearningRate 0.0773 Epoch: 2 Global Step: 30050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:20:59,316-Speed 3390.04 samples/sec Loss 8.4190 LearningRate 0.0773 Epoch: 2 Global Step: 30060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:21:02,345-Speed 3381.72 samples/sec Loss 8.3799 LearningRate 0.0773 Epoch: 2 Global Step: 30070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:21:05,430-Speed 3320.85 samples/sec Loss 8.3197 LearningRate 0.0772 Epoch: 2 Global Step: 30080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:21:08,496-Speed 3340.36 samples/sec Loss 8.4114 LearningRate 0.0772 Epoch: 2 Global Step: 30090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:21:11,585-Speed 3315.63 samples/sec Loss 8.5202 LearningRate 0.0772 Epoch: 2 Global Step: 30100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:21:14,638-Speed 3355.94 samples/sec Loss 8.4798 LearningRate 0.0772 Epoch: 2 Global Step: 30110 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:21:17,740-Speed 3301.69 samples/sec Loss 8.4160 LearningRate 0.0772 Epoch: 2 Global Step: 30120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:21:20,774-Speed 3375.55 samples/sec Loss 8.5384 LearningRate 0.0772 Epoch: 2 Global Step: 30130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:21:23,795-Speed 3391.48 samples/sec Loss 8.5427 LearningRate 0.0772 Epoch: 2 Global Step: 30140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:21:26,822-Speed 3384.27 samples/sec Loss 8.3561 LearningRate 0.0772 Epoch: 2 Global Step: 30150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:21:29,841-Speed 3392.01 samples/sec Loss 8.4888 LearningRate 0.0772 Epoch: 2 Global Step: 30160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:21:32,855-Speed 3398.88 samples/sec Loss 8.5245 LearningRate 0.0772 Epoch: 2 Global Step: 30170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:21:35,946-Speed 3314.12 samples/sec Loss 8.5348 LearningRate 0.0772 Epoch: 2 Global Step: 30180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:21:39,018-Speed 3334.58 samples/sec Loss 8.5691 LearningRate 0.0772 Epoch: 2 Global Step: 30190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:21:42,061-Speed 3366.42 samples/sec Loss 8.4088 LearningRate 0.0772 Epoch: 2 Global Step: 30200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:21:45,097-Speed 3373.48 samples/sec Loss 8.3307 LearningRate 0.0772 Epoch: 2 Global Step: 30210 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:21:48,154-Speed 3350.97 samples/sec Loss 8.4270 LearningRate 0.0772 Epoch: 2 Global Step: 30220 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:21:51,182-Speed 3383.18 samples/sec Loss 8.3823 LearningRate 0.0771 Epoch: 2 Global Step: 30230 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:21:54,233-Speed 3357.48 samples/sec Loss 8.3391 LearningRate 0.0771 Epoch: 2 Global Step: 30240 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:21:57,246-Speed 3399.10 samples/sec Loss 8.5387 LearningRate 0.0771 Epoch: 2 Global Step: 30250 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:22:00,308-Speed 3345.81 samples/sec Loss 8.4194 LearningRate 0.0771 Epoch: 2 Global Step: 30260 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:22:03,335-Speed 3384.29 samples/sec Loss 8.3780 LearningRate 0.0771 Epoch: 2 Global Step: 30270 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:22:06,405-Speed 3336.60 samples/sec Loss 8.4798 LearningRate 0.0771 Epoch: 2 Global Step: 30280 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:22:09,411-Speed 3407.54 samples/sec Loss 8.5563 LearningRate 0.0771 Epoch: 2 Global Step: 30290 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:22:12,427-Speed 3396.40 samples/sec Loss 8.3332 LearningRate 0.0771 Epoch: 2 Global Step: 30300 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:22:15,466-Speed 3370.37 samples/sec Loss 8.5129 LearningRate 0.0771 Epoch: 2 Global Step: 30310 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:22:18,484-Speed 3394.06 samples/sec Loss 8.3595 LearningRate 0.0771 Epoch: 2 Global Step: 30320 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:22:21,478-Speed 3421.44 samples/sec Loss 8.3648 LearningRate 0.0771 Epoch: 2 Global Step: 30330 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:22:24,559-Speed 3324.77 samples/sec Loss 8.4508 LearningRate 0.0771 Epoch: 2 Global Step: 30340 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:22:27,627-Speed 3338.91 samples/sec Loss 8.4158 LearningRate 0.0771 Epoch: 2 Global Step: 30350 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:22:30,701-Speed 3332.31 samples/sec Loss 8.3932 LearningRate 0.0771 Epoch: 2 Global Step: 30360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:22:33,691-Speed 3425.04 samples/sec Loss 8.5434 LearningRate 0.0770 Epoch: 2 Global Step: 30370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:22:36,706-Speed 3396.96 samples/sec Loss 8.4650 LearningRate 0.0770 Epoch: 2 Global Step: 30380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:22:39,719-Speed 3400.33 samples/sec Loss 8.4962 LearningRate 0.0770 Epoch: 2 Global Step: 30390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:22:42,762-Speed 3366.37 samples/sec Loss 8.3870 LearningRate 0.0770 Epoch: 2 Global Step: 30400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:22:45,772-Speed 3402.34 samples/sec Loss 8.3883 LearningRate 0.0770 Epoch: 2 Global Step: 30410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:22:48,811-Speed 3370.78 samples/sec Loss 8.4029 LearningRate 0.0770 Epoch: 2 Global Step: 30420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:22:51,829-Speed 3394.30 samples/sec Loss 8.3872 LearningRate 0.0770 Epoch: 2 Global Step: 30430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:22:54,907-Speed 3328.04 samples/sec Loss 8.3885 LearningRate 0.0770 Epoch: 2 Global Step: 30440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:22:57,933-Speed 3384.19 samples/sec Loss 8.4674 LearningRate 0.0770 Epoch: 2 Global Step: 30450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:23:00,970-Speed 3373.14 samples/sec Loss 8.5014 LearningRate 0.0770 Epoch: 2 Global Step: 30460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:23:04,056-Speed 3318.98 samples/sec Loss 8.5100 LearningRate 0.0770 Epoch: 2 Global Step: 30470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:23:07,088-Speed 3379.20 samples/sec Loss 8.4706 LearningRate 0.0770 Epoch: 2 Global Step: 30480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:23:10,083-Speed 3419.51 samples/sec Loss 8.6286 LearningRate 0.0770 Epoch: 2 Global Step: 30490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:23:13,101-Speed 3394.82 samples/sec Loss 8.5627 LearningRate 0.0770 Epoch: 2 Global Step: 30500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:23:16,109-Speed 3404.86 samples/sec Loss 8.4317 LearningRate 0.0769 Epoch: 2 Global Step: 30510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:23:19,110-Speed 3413.36 samples/sec Loss 8.5628 LearningRate 0.0769 Epoch: 2 Global Step: 30520 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:23:22,142-Speed 3378.27 samples/sec Loss 8.3462 LearningRate 0.0769 Epoch: 2 Global Step: 30530 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:23:25,181-Speed 3371.08 samples/sec Loss 8.4358 LearningRate 0.0769 Epoch: 2 Global Step: 30540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:23:28,234-Speed 3354.36 samples/sec Loss 8.4773 LearningRate 0.0769 Epoch: 2 Global Step: 30550 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:23:31,251-Speed 3395.78 samples/sec Loss 8.4394 LearningRate 0.0769 Epoch: 2 Global Step: 30560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:23:34,273-Speed 3389.87 samples/sec Loss 8.4248 LearningRate 0.0769 Epoch: 2 Global Step: 30570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:23:37,304-Speed 3379.63 samples/sec Loss 8.4626 LearningRate 0.0769 Epoch: 2 Global Step: 30580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:23:40,309-Speed 3409.15 samples/sec Loss 8.3673 LearningRate 0.0769 Epoch: 2 Global Step: 30590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:23:43,355-Speed 3363.29 samples/sec Loss 8.5156 LearningRate 0.0769 Epoch: 2 Global Step: 30600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:23:46,383-Speed 3382.53 samples/sec Loss 8.4073 LearningRate 0.0769 Epoch: 2 Global Step: 30610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:23:49,519-Speed 3265.99 samples/sec Loss 8.4218 LearningRate 0.0769 Epoch: 2 Global Step: 30620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:23:52,539-Speed 3392.69 samples/sec Loss 8.3844 LearningRate 0.0769 Epoch: 2 Global Step: 30630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:23:55,586-Speed 3361.16 samples/sec Loss 8.6503 LearningRate 0.0769 Epoch: 2 Global Step: 30640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:23:58,611-Speed 3386.13 samples/sec Loss 8.5416 LearningRate 0.0768 Epoch: 2 Global Step: 30650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:01,709-Speed 3306.73 samples/sec Loss 8.5201 LearningRate 0.0768 Epoch: 2 Global Step: 30660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:04,741-Speed 3378.54 samples/sec Loss 8.3512 LearningRate 0.0768 Epoch: 2 Global Step: 30670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:07,776-Speed 3375.22 samples/sec Loss 8.4317 LearningRate 0.0768 Epoch: 2 Global Step: 30680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:10,773-Speed 3417.71 samples/sec Loss 8.3289 LearningRate 0.0768 Epoch: 2 Global Step: 30690 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:13,794-Speed 3391.14 samples/sec Loss 8.5304 LearningRate 0.0768 Epoch: 2 Global Step: 30700 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:16,853-Speed 3349.01 samples/sec Loss 8.4965 LearningRate 0.0768 Epoch: 2 Global Step: 30710 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:19,881-Speed 3382.71 samples/sec Loss 8.4075 LearningRate 0.0768 Epoch: 2 Global Step: 30720 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:22,919-Speed 3372.22 samples/sec Loss 8.4517 LearningRate 0.0768 Epoch: 2 Global Step: 30730 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:25,961-Speed 3366.86 samples/sec Loss 8.3511 LearningRate 0.0768 Epoch: 2 Global Step: 30740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:24:29,011-Speed 3357.84 samples/sec Loss 8.4806 LearningRate 0.0768 Epoch: 2 Global Step: 30750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:24:32,099-Speed 3317.80 samples/sec Loss 8.3710 LearningRate 0.0768 Epoch: 2 Global Step: 30760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:24:35,113-Speed 3398.24 samples/sec Loss 8.4009 LearningRate 0.0768 Epoch: 2 Global Step: 30770 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:38,160-Speed 3362.00 samples/sec Loss 8.4018 LearningRate 0.0768 Epoch: 2 Global Step: 30780 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:41,225-Speed 3342.44 samples/sec Loss 8.3789 LearningRate 0.0767 Epoch: 2 Global Step: 30790 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:44,250-Speed 3386.29 samples/sec Loss 8.4224 LearningRate 0.0767 Epoch: 2 Global Step: 30800 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:47,269-Speed 3392.55 samples/sec Loss 8.5087 LearningRate 0.0767 Epoch: 2 Global Step: 30810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:50,306-Speed 3372.22 samples/sec Loss 8.5032 LearningRate 0.0767 Epoch: 2 Global Step: 30820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:53,383-Speed 3329.79 samples/sec Loss 8.3988 LearningRate 0.0767 Epoch: 2 Global Step: 30830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:56,416-Speed 3377.19 samples/sec Loss 8.4286 LearningRate 0.0767 Epoch: 2 Global Step: 30840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:24:59,451-Speed 3374.65 samples/sec Loss 8.4016 LearningRate 0.0767 Epoch: 2 Global Step: 30850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:25:02,498-Speed 3361.93 samples/sec Loss 8.5830 LearningRate 0.0767 Epoch: 2 Global Step: 30860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:25:05,520-Speed 3389.65 samples/sec Loss 8.3577 LearningRate 0.0767 Epoch: 2 Global Step: 30870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:25:08,523-Speed 3411.39 samples/sec Loss 8.4397 LearningRate 0.0767 Epoch: 2 Global Step: 30880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:25:11,537-Speed 3398.34 samples/sec Loss 8.3815 LearningRate 0.0767 Epoch: 2 Global Step: 30890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:25:14,572-Speed 3374.74 samples/sec Loss 8.5388 LearningRate 0.0767 Epoch: 2 Global Step: 30900 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:25:17,613-Speed 3369.10 samples/sec Loss 8.4316 LearningRate 0.0767 Epoch: 2 Global Step: 30910 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:25:20,626-Speed 3399.49 samples/sec Loss 8.4303 LearningRate 0.0767 Epoch: 2 Global Step: 30920 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:25:23,690-Speed 3342.78 samples/sec Loss 8.4315 LearningRate 0.0766 Epoch: 2 Global Step: 30930 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:25:26,727-Speed 3372.95 samples/sec Loss 8.6167 LearningRate 0.0766 Epoch: 2 Global Step: 30940 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:25:29,788-Speed 3346.54 samples/sec Loss 8.4876 LearningRate 0.0766 Epoch: 2 Global Step: 30950 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:25:32,856-Speed 3337.57 samples/sec Loss 8.3018 LearningRate 0.0766 Epoch: 2 Global Step: 30960 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:25:35,971-Speed 3288.93 samples/sec Loss 8.5723 LearningRate 0.0766 Epoch: 2 Global Step: 30970 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:25:39,023-Speed 3356.24 samples/sec Loss 8.3451 LearningRate 0.0766 Epoch: 2 Global Step: 30980 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:25:42,154-Speed 3271.96 samples/sec Loss 8.5362 LearningRate 0.0766 Epoch: 2 Global Step: 30990 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:25:45,189-Speed 3375.28 samples/sec Loss 8.3801 LearningRate 0.0766 Epoch: 2 Global Step: 31000 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:25:48,252-Speed 3343.76 samples/sec Loss 8.4679 LearningRate 0.0766 Epoch: 2 Global Step: 31010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:25:51,267-Speed 3397.78 samples/sec Loss 8.4185 LearningRate 0.0766 Epoch: 2 Global Step: 31020 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:25:54,352-Speed 3320.62 samples/sec Loss 8.4730 LearningRate 0.0766 Epoch: 2 Global Step: 31030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:25:57,373-Speed 3390.05 samples/sec Loss 8.3790 LearningRate 0.0766 Epoch: 2 Global Step: 31040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:00,464-Speed 3314.26 samples/sec Loss 8.3976 LearningRate 0.0766 Epoch: 2 Global Step: 31050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:03,554-Speed 3314.90 samples/sec Loss 8.4425 LearningRate 0.0766 Epoch: 2 Global Step: 31060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:06,643-Speed 3315.92 samples/sec Loss 8.4308 LearningRate 0.0766 Epoch: 2 Global Step: 31070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:09,668-Speed 3385.94 samples/sec Loss 8.3642 LearningRate 0.0765 Epoch: 2 Global Step: 31080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:12,681-Speed 3400.47 samples/sec Loss 8.5024 LearningRate 0.0765 Epoch: 2 Global Step: 31090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:15,716-Speed 3374.74 samples/sec Loss 8.4510 LearningRate 0.0765 Epoch: 2 Global Step: 31100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:26:18,861-Speed 3256.80 samples/sec Loss 8.4692 LearningRate 0.0765 Epoch: 2 Global Step: 31110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:26:21,872-Speed 3401.85 samples/sec Loss 8.4126 LearningRate 0.0765 Epoch: 2 Global Step: 31120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:26:24,917-Speed 3364.26 samples/sec Loss 8.5006 LearningRate 0.0765 Epoch: 2 Global Step: 31130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:26:27,952-Speed 3375.00 samples/sec Loss 8.4802 LearningRate 0.0765 Epoch: 2 Global Step: 31140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:26:31,031-Speed 3326.59 samples/sec Loss 8.3995 LearningRate 0.0765 Epoch: 2 Global Step: 31150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:26:34,069-Speed 3372.67 samples/sec Loss 8.3862 LearningRate 0.0765 Epoch: 2 Global Step: 31160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:26:37,173-Speed 3299.53 samples/sec Loss 8.3673 LearningRate 0.0765 Epoch: 2 Global Step: 31170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:26:40,212-Speed 3369.94 samples/sec Loss 8.3773 LearningRate 0.0765 Epoch: 2 Global Step: 31180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:43,251-Speed 3370.70 samples/sec Loss 8.3540 LearningRate 0.0765 Epoch: 2 Global Step: 31190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:46,258-Speed 3406.59 samples/sec Loss 8.5348 LearningRate 0.0765 Epoch: 2 Global Step: 31200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:49,297-Speed 3370.36 samples/sec Loss 8.5786 LearningRate 0.0765 Epoch: 2 Global Step: 31210 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:52,383-Speed 3320.49 samples/sec Loss 8.4518 LearningRate 0.0764 Epoch: 2 Global Step: 31220 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:55,454-Speed 3334.62 samples/sec Loss 8.4125 LearningRate 0.0764 Epoch: 2 Global Step: 31230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:26:58,494-Speed 3369.71 samples/sec Loss 8.4253 LearningRate 0.0764 Epoch: 2 Global Step: 31240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:27:01,496-Speed 3412.52 samples/sec Loss 8.4914 LearningRate 0.0764 Epoch: 2 Global Step: 31250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:27:04,503-Speed 3406.34 samples/sec Loss 8.3875 LearningRate 0.0764 Epoch: 2 Global Step: 31260 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:27:07,530-Speed 3383.68 samples/sec Loss 8.4650 LearningRate 0.0764 Epoch: 2 Global Step: 31270 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:27:10,538-Speed 3405.93 samples/sec Loss 8.3351 LearningRate 0.0764 Epoch: 2 Global Step: 31280 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:27:13,633-Speed 3309.76 samples/sec Loss 8.2932 LearningRate 0.0764 Epoch: 2 Global Step: 31290 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:27:16,667-Speed 3376.05 samples/sec Loss 8.5801 LearningRate 0.0764 Epoch: 2 Global Step: 31300 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:27:19,719-Speed 3356.06 samples/sec Loss 8.4008 LearningRate 0.0764 Epoch: 2 Global Step: 31310 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:27:22,727-Speed 3405.46 samples/sec Loss 8.5529 LearningRate 0.0764 Epoch: 2 Global Step: 31320 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:27:25,747-Speed 3391.26 samples/sec Loss 8.3680 LearningRate 0.0764 Epoch: 2 Global Step: 31330 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:27:28,813-Speed 3341.28 samples/sec Loss 8.5234 LearningRate 0.0764 Epoch: 2 Global Step: 31340 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:27:31,843-Speed 3379.88 samples/sec Loss 8.4550 LearningRate 0.0764 Epoch: 2 Global Step: 31350 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-04-27 04:27:34,883-Speed 3370.65 samples/sec Loss 8.4277 LearningRate 0.0763 Epoch: 2 Global Step: 31360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:27:37,897-Speed 3399.00 samples/sec Loss 8.4391 LearningRate 0.0763 Epoch: 2 Global Step: 31370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:27:40,948-Speed 3356.56 samples/sec Loss 8.4111 LearningRate 0.0763 Epoch: 2 Global Step: 31380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:27:43,955-Speed 3406.91 samples/sec Loss 8.4085 LearningRate 0.0763 Epoch: 2 Global Step: 31390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:27:47,025-Speed 3336.57 samples/sec Loss 8.4096 LearningRate 0.0763 Epoch: 2 Global Step: 31400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:27:50,083-Speed 3350.14 samples/sec Loss 8.4423 LearningRate 0.0763 Epoch: 2 Global Step: 31410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:27:53,112-Speed 3381.37 samples/sec Loss 8.3178 LearningRate 0.0763 Epoch: 2 Global Step: 31420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:27:56,154-Speed 3367.02 samples/sec Loss 8.5621 LearningRate 0.0763 Epoch: 2 Global Step: 31430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:27:59,178-Speed 3387.99 samples/sec Loss 8.3926 LearningRate 0.0763 Epoch: 2 Global Step: 31440 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:28:02,220-Speed 3366.85 samples/sec Loss 8.3410 LearningRate 0.0763 Epoch: 2 Global Step: 31450 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:28:05,242-Speed 3390.57 samples/sec Loss 8.5166 LearningRate 0.0763 Epoch: 2 Global Step: 31460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:08,286-Speed 3364.33 samples/sec Loss 8.2950 LearningRate 0.0763 Epoch: 2 Global Step: 31470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:11,316-Speed 3380.70 samples/sec Loss 8.4255 LearningRate 0.0763 Epoch: 2 Global Step: 31480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:14,406-Speed 3315.32 samples/sec Loss 8.3570 LearningRate 0.0763 Epoch: 2 Global Step: 31490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:17,428-Speed 3389.57 samples/sec Loss 8.4557 LearningRate 0.0762 Epoch: 2 Global Step: 31500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:20,441-Speed 3400.05 samples/sec Loss 8.4001 LearningRate 0.0762 Epoch: 2 Global Step: 31510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:23,465-Speed 3387.07 samples/sec Loss 8.3110 LearningRate 0.0762 Epoch: 2 Global Step: 31520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:26,547-Speed 3323.24 samples/sec Loss 8.4131 LearningRate 0.0762 Epoch: 2 Global Step: 31530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:29,561-Speed 3398.74 samples/sec Loss 8.4159 LearningRate 0.0762 Epoch: 2 Global Step: 31540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:32,585-Speed 3387.04 samples/sec Loss 8.3922 LearningRate 0.0762 Epoch: 2 Global Step: 31550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:35,631-Speed 3363.91 samples/sec Loss 8.3491 LearningRate 0.0762 Epoch: 2 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 04:28:38,693-Speed 3344.77 samples/sec Loss 8.3513 LearningRate 0.0762 Epoch: 2 Global Step: 31570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 04:28:41,791-Speed 3306.90 samples/sec Loss 8.3260 LearningRate 0.0762 Epoch: 2 Global Step: 31580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 04:28:44,789-Speed 3416.37 samples/sec Loss 8.4301 LearningRate 0.0762 Epoch: 2 Global Step: 31590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 04:28:47,829-Speed 3369.35 samples/sec Loss 8.5058 LearningRate 0.0762 Epoch: 2 Global Step: 31600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:50,838-Speed 3403.95 samples/sec Loss 8.3731 LearningRate 0.0762 Epoch: 2 Global Step: 31610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:28:53,920-Speed 3324.20 samples/sec Loss 8.4870 LearningRate 0.0762 Epoch: 2 Global Step: 31620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:28:56,953-Speed 3377.49 samples/sec Loss 8.4422 LearningRate 0.0762 Epoch: 2 Global Step: 31630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:28:59,960-Speed 3405.96 samples/sec Loss 8.4677 LearningRate 0.0761 Epoch: 2 Global Step: 31640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:29:02,985-Speed 3385.66 samples/sec Loss 8.2311 LearningRate 0.0761 Epoch: 2 Global Step: 31650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:29:06,058-Speed 3333.82 samples/sec Loss 8.3713 LearningRate 0.0761 Epoch: 2 Global Step: 31660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:29:09,115-Speed 3351.30 samples/sec Loss 8.4942 LearningRate 0.0761 Epoch: 2 Global Step: 31670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:29:12,156-Speed 3380.96 samples/sec Loss 8.3758 LearningRate 0.0761 Epoch: 2 Global Step: 31680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:29:15,247-Speed 3314.47 samples/sec Loss 8.3863 LearningRate 0.0761 Epoch: 2 Global Step: 31690 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:29:18,277-Speed 3380.81 samples/sec Loss 8.4714 LearningRate 0.0761 Epoch: 2 Global Step: 31700 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:29:21,289-Speed 3400.21 samples/sec Loss 8.5507 LearningRate 0.0761 Epoch: 2 Global Step: 31710 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 04:29:24,306-Speed 3395.28 samples/sec Loss 8.4808 LearningRate 0.0761 Epoch: 2 Global Step: 31720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:29:27,314-Speed 3405.80 samples/sec Loss 8.3818 LearningRate 0.0761 Epoch: 2 Global Step: 31730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:29:30,326-Speed 3400.54 samples/sec Loss 8.4260 LearningRate 0.0761 Epoch: 2 Global Step: 31740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 04:29:33,319-Speed 3422.34 samples/sec Loss 8.3820 LearningRate 0.0761 Epoch: 2 Global Step: 31750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:29:36,334-Speed 3397.02 samples/sec Loss 8.4330 LearningRate 0.0761 Epoch: 2 Global Step: 31760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:29:39,377-Speed 3367.31 samples/sec Loss 8.5144 LearningRate 0.0761 Epoch: 2 Global Step: 31770 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:29:42,431-Speed 3354.04 samples/sec Loss 8.4003 LearningRate 0.0761 Epoch: 2 Global Step: 31780 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:29:45,486-Speed 3351.99 samples/sec Loss 8.4758 LearningRate 0.0760 Epoch: 2 Global Step: 31790 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:29:48,552-Speed 3341.71 samples/sec Loss 8.3984 LearningRate 0.0760 Epoch: 2 Global Step: 31800 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:29:51,582-Speed 3380.85 samples/sec Loss 8.3848 LearningRate 0.0760 Epoch: 2 Global Step: 31810 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:29:54,621-Speed 3370.28 samples/sec Loss 8.3836 LearningRate 0.0760 Epoch: 2 Global Step: 31820 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:29:57,619-Speed 3416.79 samples/sec Loss 8.2474 LearningRate 0.0760 Epoch: 2 Global Step: 31830 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:30:00,624-Speed 3408.65 samples/sec Loss 8.4815 LearningRate 0.0760 Epoch: 2 Global Step: 31840 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:30:03,650-Speed 3384.80 samples/sec Loss 8.3424 LearningRate 0.0760 Epoch: 2 Global Step: 31850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:30:06,684-Speed 3376.57 samples/sec Loss 8.4596 LearningRate 0.0760 Epoch: 2 Global Step: 31860 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:30:09,681-Speed 3417.14 samples/sec Loss 8.4879 LearningRate 0.0760 Epoch: 2 Global Step: 31870 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:30:12,716-Speed 3376.35 samples/sec Loss 8.3504 LearningRate 0.0760 Epoch: 2 Global Step: 31880 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:30:15,770-Speed 3354.10 samples/sec Loss 8.3946 LearningRate 0.0760 Epoch: 2 Global Step: 31890 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:30:18,808-Speed 3370.52 samples/sec Loss 8.3099 LearningRate 0.0760 Epoch: 2 Global Step: 31900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:30:21,823-Speed 3397.80 samples/sec Loss 8.4533 LearningRate 0.0760 Epoch: 2 Global Step: 31910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:30:24,874-Speed 3357.73 samples/sec Loss 8.4882 LearningRate 0.0760 Epoch: 2 Global Step: 31920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:30:27,915-Speed 3367.78 samples/sec Loss 8.4101 LearningRate 0.0759 Epoch: 2 Global Step: 31930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:30:30,938-Speed 3388.74 samples/sec Loss 8.2884 LearningRate 0.0759 Epoch: 2 Global Step: 31940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:30:33,931-Speed 3423.05 samples/sec Loss 8.3986 LearningRate 0.0759 Epoch: 2 Global Step: 31950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:30:36,953-Speed 3389.24 samples/sec Loss 8.3682 LearningRate 0.0759 Epoch: 2 Global Step: 31960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:30:39,960-Speed 3407.05 samples/sec Loss 8.4243 LearningRate 0.0759 Epoch: 2 Global Step: 31970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:30:42,973-Speed 3398.73 samples/sec Loss 8.4976 LearningRate 0.0759 Epoch: 2 Global Step: 31980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:30:45,976-Speed 3411.05 samples/sec Loss 8.4165 LearningRate 0.0759 Epoch: 2 Global Step: 31990 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:30:49,101-Speed 3278.31 samples/sec Loss 8.3531 LearningRate 0.0759 Epoch: 2 Global Step: 32000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:30:52,171-Speed 3336.34 samples/sec Loss 8.4648 LearningRate 0.0759 Epoch: 2 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:30:55,226-Speed 3353.02 samples/sec Loss 8.4059 LearningRate 0.0759 Epoch: 2 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:30:58,278-Speed 3355.34 samples/sec Loss 8.4149 LearningRate 0.0759 Epoch: 2 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:31:01,311-Speed 3377.48 samples/sec Loss 8.4567 LearningRate 0.0759 Epoch: 2 Global Step: 32040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:31:04,334-Speed 3389.23 samples/sec Loss 8.4007 LearningRate 0.0759 Epoch: 2 Global Step: 32050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:07,424-Speed 3314.54 samples/sec Loss 8.5078 LearningRate 0.0759 Epoch: 2 Global Step: 32060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:10,428-Speed 3409.74 samples/sec Loss 8.3613 LearningRate 0.0758 Epoch: 2 Global Step: 32070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:13,468-Speed 3370.03 samples/sec Loss 8.4544 LearningRate 0.0758 Epoch: 2 Global Step: 32080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:16,476-Speed 3404.70 samples/sec Loss 8.2556 LearningRate 0.0758 Epoch: 2 Global Step: 32090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:19,486-Speed 3403.23 samples/sec Loss 8.4567 LearningRate 0.0758 Epoch: 2 Global Step: 32100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:22,498-Speed 3401.45 samples/sec Loss 8.4233 LearningRate 0.0758 Epoch: 2 Global Step: 32110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:25,611-Speed 3290.37 samples/sec Loss 8.2420 LearningRate 0.0758 Epoch: 2 Global Step: 32120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:28,684-Speed 3332.82 samples/sec Loss 8.3782 LearningRate 0.0758 Epoch: 2 Global Step: 32130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:31,695-Speed 3401.39 samples/sec Loss 8.3640 LearningRate 0.0758 Epoch: 2 Global Step: 32140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:34,724-Speed 3382.22 samples/sec Loss 8.3952 LearningRate 0.0758 Epoch: 2 Global Step: 32150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:37,785-Speed 3345.87 samples/sec Loss 8.3124 LearningRate 0.0758 Epoch: 2 Global Step: 32160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:40,840-Speed 3352.88 samples/sec Loss 8.4483 LearningRate 0.0758 Epoch: 2 Global Step: 32170 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:43,857-Speed 3396.22 samples/sec Loss 8.4173 LearningRate 0.0758 Epoch: 2 Global Step: 32180 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:46,857-Speed 3414.40 samples/sec Loss 8.4763 LearningRate 0.0758 Epoch: 2 Global Step: 32190 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:49,865-Speed 3405.09 samples/sec Loss 8.3316 LearningRate 0.0758 Epoch: 2 Global Step: 32200 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:52,901-Speed 3372.99 samples/sec Loss 8.3335 LearningRate 0.0757 Epoch: 2 Global Step: 32210 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:55,984-Speed 3322.96 samples/sec Loss 8.2881 LearningRate 0.0757 Epoch: 2 Global Step: 32220 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:31:59,005-Speed 3390.89 samples/sec Loss 8.4159 LearningRate 0.0757 Epoch: 2 Global Step: 32230 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:02,081-Speed 3330.31 samples/sec Loss 8.3534 LearningRate 0.0757 Epoch: 2 Global Step: 32240 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:05,130-Speed 3358.65 samples/sec Loss 8.4089 LearningRate 0.0757 Epoch: 2 Global Step: 32250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:32:08,177-Speed 3362.22 samples/sec Loss 8.4424 LearningRate 0.0757 Epoch: 2 Global Step: 32260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:32:11,177-Speed 3414.40 samples/sec Loss 8.4640 LearningRate 0.0757 Epoch: 2 Global Step: 32270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:32:14,222-Speed 3364.27 samples/sec Loss 8.4467 LearningRate 0.0757 Epoch: 2 Global Step: 32280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:32:17,274-Speed 3355.66 samples/sec Loss 8.3282 LearningRate 0.0757 Epoch: 2 Global Step: 32290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:32:20,315-Speed 3369.32 samples/sec Loss 8.5133 LearningRate 0.0757 Epoch: 2 Global Step: 32300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:32:23,345-Speed 3380.07 samples/sec Loss 8.4642 LearningRate 0.0757 Epoch: 2 Global Step: 32310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:26,394-Speed 3359.02 samples/sec Loss 8.3915 LearningRate 0.0757 Epoch: 2 Global Step: 32320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:29,509-Speed 3288.38 samples/sec Loss 8.3748 LearningRate 0.0757 Epoch: 2 Global Step: 32330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:32,560-Speed 3357.61 samples/sec Loss 8.2964 LearningRate 0.0757 Epoch: 2 Global Step: 32340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:35,557-Speed 3418.49 samples/sec Loss 8.3898 LearningRate 0.0757 Epoch: 2 Global Step: 32350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:38,573-Speed 3395.83 samples/sec Loss 8.3884 LearningRate 0.0756 Epoch: 2 Global Step: 32360 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:41,623-Speed 3359.18 samples/sec Loss 8.2773 LearningRate 0.0756 Epoch: 2 Global Step: 32370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:44,653-Speed 3379.64 samples/sec Loss 8.4285 LearningRate 0.0756 Epoch: 2 Global Step: 32380 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:47,686-Speed 3377.20 samples/sec Loss 8.2905 LearningRate 0.0756 Epoch: 2 Global Step: 32390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:50,718-Speed 3378.33 samples/sec Loss 8.2840 LearningRate 0.0756 Epoch: 2 Global Step: 32400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:53,744-Speed 3385.82 samples/sec Loss 8.4020 LearningRate 0.0756 Epoch: 2 Global Step: 32410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:32:56,728-Speed 3432.81 samples/sec Loss 8.2645 LearningRate 0.0756 Epoch: 2 Global Step: 32420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:32:59,740-Speed 3399.72 samples/sec Loss 8.3855 LearningRate 0.0756 Epoch: 2 Global Step: 32430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:33:02,811-Speed 3335.46 samples/sec Loss 8.4320 LearningRate 0.0756 Epoch: 2 Global Step: 32440 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:33:05,876-Speed 3342.94 samples/sec Loss 8.4406 LearningRate 0.0756 Epoch: 2 Global Step: 32450 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:33:08,899-Speed 3387.74 samples/sec Loss 8.3288 LearningRate 0.0756 Epoch: 2 Global Step: 32460 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:33:11,909-Speed 3402.97 samples/sec Loss 8.4930 LearningRate 0.0756 Epoch: 2 Global Step: 32470 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:33:14,917-Speed 3406.13 samples/sec Loss 8.4401 LearningRate 0.0756 Epoch: 2 Global Step: 32480 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:33:17,939-Speed 3389.51 samples/sec Loss 8.3545 LearningRate 0.0756 Epoch: 2 Global Step: 32490 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:33:20,932-Speed 3421.43 samples/sec Loss 8.3400 LearningRate 0.0755 Epoch: 2 Global Step: 32500 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:33:23,962-Speed 3380.59 samples/sec Loss 8.3355 LearningRate 0.0755 Epoch: 2 Global Step: 32510 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:33:27,017-Speed 3353.27 samples/sec Loss 8.4701 LearningRate 0.0755 Epoch: 2 Global Step: 32520 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:33:30,076-Speed 3349.06 samples/sec Loss 8.4054 LearningRate 0.0755 Epoch: 2 Global Step: 32530 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:33:33,129-Speed 3355.18 samples/sec Loss 8.4010 LearningRate 0.0755 Epoch: 2 Global Step: 32540 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:33:36,235-Speed 3297.87 samples/sec Loss 8.4294 LearningRate 0.0755 Epoch: 2 Global Step: 32550 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:33:39,278-Speed 3365.34 samples/sec Loss 8.4366 LearningRate 0.0755 Epoch: 2 Global Step: 32560 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:33:42,295-Speed 3395.69 samples/sec Loss 8.3867 LearningRate 0.0755 Epoch: 2 Global Step: 32570 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:33:45,303-Speed 3405.65 samples/sec Loss 8.2730 LearningRate 0.0755 Epoch: 2 Global Step: 32580 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:33:48,331-Speed 3382.86 samples/sec Loss 8.4036 LearningRate 0.0755 Epoch: 2 Global Step: 32590 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:33:51,368-Speed 3372.98 samples/sec Loss 8.4149 LearningRate 0.0755 Epoch: 2 Global Step: 32600 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:33:54,423-Speed 3352.01 samples/sec Loss 8.4806 LearningRate 0.0755 Epoch: 2 Global Step: 32610 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:33:57,421-Speed 3416.95 samples/sec Loss 8.4338 LearningRate 0.0755 Epoch: 2 Global Step: 32620 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:34:00,447-Speed 3385.02 samples/sec Loss 8.4692 LearningRate 0.0755 Epoch: 2 Global Step: 32630 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:34:03,521-Speed 3332.08 samples/sec Loss 8.4436 LearningRate 0.0754 Epoch: 2 Global Step: 32640 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:34:06,596-Speed 3331.94 samples/sec Loss 8.3128 LearningRate 0.0754 Epoch: 2 Global Step: 32650 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:34:09,604-Speed 3405.11 samples/sec Loss 8.2924 LearningRate 0.0754 Epoch: 2 Global Step: 32660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:34:12,599-Speed 3419.63 samples/sec Loss 8.2848 LearningRate 0.0754 Epoch: 2 Global Step: 32670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:34:15,631-Speed 3378.69 samples/sec Loss 8.2660 LearningRate 0.0754 Epoch: 2 Global Step: 32680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:34:18,742-Speed 3292.67 samples/sec Loss 8.2731 LearningRate 0.0754 Epoch: 2 Global Step: 32690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:34:21,759-Speed 3395.26 samples/sec Loss 8.4698 LearningRate 0.0754 Epoch: 2 Global Step: 32700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:34:24,764-Speed 3408.32 samples/sec Loss 8.4650 LearningRate 0.0754 Epoch: 2 Global Step: 32710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:34:27,793-Speed 3382.46 samples/sec Loss 8.4809 LearningRate 0.0754 Epoch: 2 Global Step: 32720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:34:30,808-Speed 3396.58 samples/sec Loss 8.4045 LearningRate 0.0754 Epoch: 2 Global Step: 32730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:34:33,872-Speed 3343.40 samples/sec Loss 8.3931 LearningRate 0.0754 Epoch: 2 Global Step: 32740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:34:36,936-Speed 3343.09 samples/sec Loss 8.3640 LearningRate 0.0754 Epoch: 2 Global Step: 32750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:34:39,960-Speed 3388.04 samples/sec Loss 8.3911 LearningRate 0.0754 Epoch: 2 Global Step: 32760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:34:43,053-Speed 3311.50 samples/sec Loss 8.3698 LearningRate 0.0754 Epoch: 2 Global Step: 32770 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:34:46,046-Speed 3422.93 samples/sec Loss 8.3842 LearningRate 0.0754 Epoch: 2 Global Step: 32780 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:34:49,087-Speed 3368.79 samples/sec Loss 8.3286 LearningRate 0.0753 Epoch: 2 Global Step: 32790 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:34:52,175-Speed 3316.69 samples/sec Loss 8.4621 LearningRate 0.0753 Epoch: 2 Global Step: 32800 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:34:55,215-Speed 3369.27 samples/sec Loss 8.4643 LearningRate 0.0753 Epoch: 2 Global Step: 32810 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:34:58,211-Speed 3419.28 samples/sec Loss 8.4096 LearningRate 0.0753 Epoch: 2 Global Step: 32820 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:35:01,330-Speed 3283.63 samples/sec Loss 8.3816 LearningRate 0.0753 Epoch: 2 Global Step: 32830 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:35:04,424-Speed 3310.87 samples/sec Loss 8.2275 LearningRate 0.0753 Epoch: 2 Global Step: 32840 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:35:07,510-Speed 3319.90 samples/sec Loss 8.2865 LearningRate 0.0753 Epoch: 2 Global Step: 32850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:35:10,568-Speed 3348.94 samples/sec Loss 8.3696 LearningRate 0.0753 Epoch: 2 Global Step: 32860 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:35:13,638-Speed 3336.64 samples/sec Loss 8.3276 LearningRate 0.0753 Epoch: 2 Global Step: 32870 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:35:16,767-Speed 3274.58 samples/sec Loss 8.2343 LearningRate 0.0753 Epoch: 2 Global Step: 32880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:35:19,799-Speed 3377.45 samples/sec Loss 8.4008 LearningRate 0.0753 Epoch: 2 Global Step: 32890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:35:22,809-Speed 3403.07 samples/sec Loss 8.3895 LearningRate 0.0753 Epoch: 2 Global Step: 32900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:35:25,883-Speed 3332.21 samples/sec Loss 8.2681 LearningRate 0.0753 Epoch: 2 Global Step: 32910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:35:28,939-Speed 3352.27 samples/sec Loss 8.2396 LearningRate 0.0753 Epoch: 2 Global Step: 32920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:35:32,036-Speed 3308.04 samples/sec Loss 8.4455 LearningRate 0.0752 Epoch: 2 Global Step: 32930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:35:35,097-Speed 3345.96 samples/sec Loss 8.2866 LearningRate 0.0752 Epoch: 2 Global Step: 32940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:35:38,202-Speed 3298.84 samples/sec Loss 8.2671 LearningRate 0.0752 Epoch: 2 Global Step: 32950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:35:41,245-Speed 3366.36 samples/sec Loss 8.3434 LearningRate 0.0752 Epoch: 2 Global Step: 32960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:35:44,291-Speed 3363.00 samples/sec Loss 8.2148 LearningRate 0.0752 Epoch: 2 Global Step: 32970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:35:47,304-Speed 3399.65 samples/sec Loss 8.4432 LearningRate 0.0752 Epoch: 2 Global Step: 32980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:35:50,393-Speed 3315.61 samples/sec Loss 8.4348 LearningRate 0.0752 Epoch: 2 Global Step: 32990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:35:53,485-Speed 3312.70 samples/sec Loss 8.3606 LearningRate 0.0752 Epoch: 2 Global Step: 33000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:35:56,540-Speed 3352.44 samples/sec Loss 8.3327 LearningRate 0.0752 Epoch: 2 Global Step: 33010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:35:59,576-Speed 3374.75 samples/sec Loss 8.3121 LearningRate 0.0752 Epoch: 2 Global Step: 33020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:36:02,696-Speed 3283.07 samples/sec Loss 8.4433 LearningRate 0.0752 Epoch: 2 Global Step: 33030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:36:05,750-Speed 3353.60 samples/sec Loss 8.4417 LearningRate 0.0752 Epoch: 2 Global Step: 33040 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:36:08,775-Speed 3386.01 samples/sec Loss 8.3809 LearningRate 0.0752 Epoch: 2 Global Step: 33050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:36:11,863-Speed 3317.80 samples/sec Loss 8.3693 LearningRate 0.0752 Epoch: 2 Global Step: 33060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:36:14,979-Speed 3286.56 samples/sec Loss 8.3394 LearningRate 0.0751 Epoch: 2 Global Step: 33070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:36:18,068-Speed 3316.30 samples/sec Loss 8.3368 LearningRate 0.0751 Epoch: 2 Global Step: 33080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:36:21,079-Speed 3402.09 samples/sec Loss 8.2877 LearningRate 0.0751 Epoch: 2 Global Step: 33090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:36:24,149-Speed 3335.82 samples/sec Loss 8.3553 LearningRate 0.0751 Epoch: 2 Global Step: 33100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:36:27,189-Speed 3370.44 samples/sec Loss 8.2306 LearningRate 0.0751 Epoch: 2 Global Step: 33110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:36:30,272-Speed 3322.00 samples/sec Loss 8.3146 LearningRate 0.0751 Epoch: 2 Global Step: 33120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:36:34,016-Speed 2735.53 samples/sec Loss 8.2960 LearningRate 0.0751 Epoch: 2 Global Step: 33130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:36:37,072-Speed 3352.50 samples/sec Loss 8.2943 LearningRate 0.0751 Epoch: 2 Global Step: 33140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:36:40,127-Speed 3352.93 samples/sec Loss 8.3191 LearningRate 0.0751 Epoch: 2 Global Step: 33150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:36:43,183-Speed 3351.31 samples/sec Loss 8.3146 LearningRate 0.0751 Epoch: 2 Global Step: 33160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:36:46,198-Speed 3398.18 samples/sec Loss 8.3150 LearningRate 0.0751 Epoch: 2 Global Step: 33170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:36:49,206-Speed 3405.31 samples/sec Loss 8.4054 LearningRate 0.0751 Epoch: 2 Global Step: 33180 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:36:52,306-Speed 3304.59 samples/sec Loss 8.3804 LearningRate 0.0751 Epoch: 2 Global Step: 33190 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:36:55,341-Speed 3374.60 samples/sec Loss 8.4929 LearningRate 0.0751 Epoch: 2 Global Step: 33200 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:36:58,358-Speed 3395.31 samples/sec Loss 8.4128 LearningRate 0.0751 Epoch: 2 Global Step: 33210 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:37:01,383-Speed 3386.09 samples/sec Loss 8.2612 LearningRate 0.0750 Epoch: 2 Global Step: 33220 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:37:04,402-Speed 3393.87 samples/sec Loss 8.3818 LearningRate 0.0750 Epoch: 2 Global Step: 33230 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:37:07,414-Speed 3400.45 samples/sec Loss 8.3683 LearningRate 0.0750 Epoch: 2 Global Step: 33240 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:37:10,480-Speed 3340.39 samples/sec Loss 8.3241 LearningRate 0.0750 Epoch: 2 Global Step: 33250 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:37:13,524-Speed 3365.79 samples/sec Loss 8.4128 LearningRate 0.0750 Epoch: 2 Global Step: 33260 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:37:16,591-Speed 3339.67 samples/sec Loss 8.2665 LearningRate 0.0750 Epoch: 2 Global Step: 33270 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:37:19,604-Speed 3399.28 samples/sec Loss 8.3106 LearningRate 0.0750 Epoch: 2 Global Step: 33280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:37:22,602-Speed 3417.35 samples/sec Loss 8.2662 LearningRate 0.0750 Epoch: 2 Global Step: 33290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:37:25,628-Speed 3385.58 samples/sec Loss 8.3027 LearningRate 0.0750 Epoch: 2 Global Step: 33300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:37:28,673-Speed 3363.29 samples/sec Loss 8.2861 LearningRate 0.0750 Epoch: 2 Global Step: 33310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:37:31,702-Speed 3381.95 samples/sec Loss 8.3403 LearningRate 0.0750 Epoch: 2 Global Step: 33320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:37:34,767-Speed 3341.56 samples/sec Loss 8.2706 LearningRate 0.0750 Epoch: 2 Global Step: 33330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:37:37,792-Speed 3386.12 samples/sec Loss 8.3034 LearningRate 0.0750 Epoch: 2 Global Step: 33340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:37:40,903-Speed 3292.70 samples/sec Loss 8.4082 LearningRate 0.0750 Epoch: 2 Global Step: 33350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:37:43,906-Speed 3410.81 samples/sec Loss 8.3240 LearningRate 0.0749 Epoch: 2 Global Step: 33360 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:37:46,967-Speed 3346.73 samples/sec Loss 8.3454 LearningRate 0.0749 Epoch: 2 Global Step: 33370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:37:49,979-Speed 3400.67 samples/sec Loss 8.2349 LearningRate 0.0749 Epoch: 2 Global Step: 33380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:37:53,010-Speed 3379.76 samples/sec Loss 8.3445 LearningRate 0.0749 Epoch: 2 Global Step: 33390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:37:56,076-Speed 3341.30 samples/sec Loss 8.3314 LearningRate 0.0749 Epoch: 2 Global Step: 33400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:37:59,109-Speed 3377.13 samples/sec Loss 8.3528 LearningRate 0.0749 Epoch: 2 Global Step: 33410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:38:02,103-Speed 3421.52 samples/sec Loss 8.3601 LearningRate 0.0749 Epoch: 2 Global Step: 33420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:05,189-Speed 3318.44 samples/sec Loss 8.2241 LearningRate 0.0749 Epoch: 2 Global Step: 33430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:08,207-Speed 3394.60 samples/sec Loss 8.4419 LearningRate 0.0749 Epoch: 2 Global Step: 33440 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:11,238-Speed 3379.18 samples/sec Loss 8.3862 LearningRate 0.0749 Epoch: 2 Global Step: 33450 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:14,246-Speed 3404.93 samples/sec Loss 8.3542 LearningRate 0.0749 Epoch: 2 Global Step: 33460 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:17,361-Speed 3289.11 samples/sec Loss 8.3997 LearningRate 0.0749 Epoch: 2 Global Step: 33470 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:20,370-Speed 3404.61 samples/sec Loss 8.3206 LearningRate 0.0749 Epoch: 2 Global Step: 33480 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:23,400-Speed 3380.10 samples/sec Loss 8.2396 LearningRate 0.0749 Epoch: 2 Global Step: 33490 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:26,449-Speed 3359.31 samples/sec Loss 8.2506 LearningRate 0.0748 Epoch: 2 Global Step: 33500 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:29,499-Speed 3358.79 samples/sec Loss 8.2869 LearningRate 0.0748 Epoch: 2 Global Step: 33510 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:32,510-Speed 3402.34 samples/sec Loss 8.3356 LearningRate 0.0748 Epoch: 2 Global Step: 33520 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:35,570-Speed 3347.07 samples/sec Loss 8.3941 LearningRate 0.0748 Epoch: 2 Global Step: 33530 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:38,612-Speed 3367.49 samples/sec Loss 8.3329 LearningRate 0.0748 Epoch: 2 Global Step: 33540 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:41,625-Speed 3399.95 samples/sec Loss 8.3344 LearningRate 0.0748 Epoch: 2 Global Step: 33550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:44,657-Speed 3377.71 samples/sec Loss 8.3694 LearningRate 0.0748 Epoch: 2 Global Step: 33560 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:47,679-Speed 3390.22 samples/sec Loss 8.1880 LearningRate 0.0748 Epoch: 2 Global Step: 33570 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:50,741-Speed 3344.89 samples/sec Loss 8.3020 LearningRate 0.0748 Epoch: 2 Global Step: 33580 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:53,749-Speed 3405.71 samples/sec Loss 8.4029 LearningRate 0.0748 Epoch: 2 Global Step: 33590 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:56,747-Speed 3417.26 samples/sec Loss 8.2708 LearningRate 0.0748 Epoch: 2 Global Step: 33600 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:38:59,753-Speed 3407.89 samples/sec Loss 8.4038 LearningRate 0.0748 Epoch: 2 Global Step: 33610 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:39:02,749-Speed 3418.58 samples/sec Loss 8.2259 LearningRate 0.0748 Epoch: 2 Global Step: 33620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:39:05,755-Speed 3408.25 samples/sec Loss 8.2564 LearningRate 0.0748 Epoch: 2 Global Step: 33630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:39:08,754-Speed 3415.55 samples/sec Loss 8.2933 LearningRate 0.0748 Epoch: 2 Global Step: 33640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:39:11,765-Speed 3401.88 samples/sec Loss 8.2674 LearningRate 0.0747 Epoch: 2 Global Step: 33650 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:39:14,811-Speed 3362.61 samples/sec Loss 8.2715 LearningRate 0.0747 Epoch: 2 Global Step: 33660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:39:17,843-Speed 3378.01 samples/sec Loss 8.3213 LearningRate 0.0747 Epoch: 2 Global Step: 33670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:39:20,852-Speed 3405.05 samples/sec Loss 8.2671 LearningRate 0.0747 Epoch: 2 Global Step: 33680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:39:23,894-Speed 3367.37 samples/sec Loss 8.1984 LearningRate 0.0747 Epoch: 2 Global Step: 33690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:39:26,919-Speed 3385.48 samples/sec Loss 8.3400 LearningRate 0.0747 Epoch: 2 Global Step: 33700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:39:29,973-Speed 3353.92 samples/sec Loss 8.3748 LearningRate 0.0747 Epoch: 2 Global Step: 33710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:39:32,975-Speed 3412.69 samples/sec Loss 8.2773 LearningRate 0.0747 Epoch: 2 Global Step: 33720 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:39:36,065-Speed 3315.05 samples/sec Loss 8.3669 LearningRate 0.0747 Epoch: 2 Global Step: 33730 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:39:39,150-Speed 3320.12 samples/sec Loss 8.3024 LearningRate 0.0747 Epoch: 2 Global Step: 33740 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:39:42,190-Speed 3369.47 samples/sec Loss 8.2354 LearningRate 0.0747 Epoch: 2 Global Step: 33750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:39:45,196-Speed 3407.25 samples/sec Loss 8.2660 LearningRate 0.0747 Epoch: 2 Global Step: 33760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:39:48,204-Speed 3405.85 samples/sec Loss 8.2951 LearningRate 0.0747 Epoch: 2 Global Step: 33770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:39:51,251-Speed 3361.55 samples/sec Loss 8.2148 LearningRate 0.0747 Epoch: 2 Global Step: 33780 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:39:54,283-Speed 3378.72 samples/sec Loss 8.4605 LearningRate 0.0746 Epoch: 2 Global Step: 33790 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:39:57,299-Speed 3396.48 samples/sec Loss 8.3985 LearningRate 0.0746 Epoch: 2 Global Step: 33800 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:40:00,347-Speed 3360.27 samples/sec Loss 8.3327 LearningRate 0.0746 Epoch: 2 Global Step: 33810 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:40:03,395-Speed 3360.64 samples/sec Loss 8.3490 LearningRate 0.0746 Epoch: 2 Global Step: 33820 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:40:06,426-Speed 3379.69 samples/sec Loss 8.2953 LearningRate 0.0746 Epoch: 2 Global Step: 33830 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:40:09,418-Speed 3423.62 samples/sec Loss 8.2687 LearningRate 0.0746 Epoch: 2 Global Step: 33840 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:40:12,525-Speed 3296.92 samples/sec Loss 8.3644 LearningRate 0.0746 Epoch: 2 Global Step: 33850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:40:15,627-Speed 3302.12 samples/sec Loss 8.4236 LearningRate 0.0746 Epoch: 2 Global Step: 33860 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:40:18,712-Speed 3320.33 samples/sec Loss 8.2354 LearningRate 0.0746 Epoch: 2 Global Step: 33870 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:40:21,714-Speed 3411.54 samples/sec Loss 8.3655 LearningRate 0.0746 Epoch: 2 Global Step: 33880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:40:24,771-Speed 3351.53 samples/sec Loss 8.2746 LearningRate 0.0746 Epoch: 2 Global Step: 33890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:40:27,842-Speed 3335.11 samples/sec Loss 8.4476 LearningRate 0.0746 Epoch: 2 Global Step: 33900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:40:30,869-Speed 3384.02 samples/sec Loss 8.2958 LearningRate 0.0746 Epoch: 2 Global Step: 33910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:40:33,864-Speed 3420.17 samples/sec Loss 8.3428 LearningRate 0.0746 Epoch: 2 Global Step: 33920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:40:36,918-Speed 3354.34 samples/sec Loss 8.4460 LearningRate 0.0745 Epoch: 2 Global Step: 33930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:40:39,939-Speed 3391.07 samples/sec Loss 8.3681 LearningRate 0.0745 Epoch: 2 Global Step: 33940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:40:43,022-Speed 3321.46 samples/sec Loss 8.3424 LearningRate 0.0745 Epoch: 2 Global Step: 33950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:40:46,028-Speed 3408.27 samples/sec Loss 8.1729 LearningRate 0.0745 Epoch: 2 Global Step: 33960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:40:49,047-Speed 3392.37 samples/sec Loss 8.2617 LearningRate 0.0745 Epoch: 2 Global Step: 33970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:40:52,055-Speed 3406.29 samples/sec Loss 8.2088 LearningRate 0.0745 Epoch: 2 Global Step: 33980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:40:55,051-Speed 3418.93 samples/sec Loss 8.4235 LearningRate 0.0745 Epoch: 2 Global Step: 33990 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:40:58,042-Speed 3425.45 samples/sec Loss 8.3140 LearningRate 0.0745 Epoch: 2 Global Step: 34000 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:41:01,074-Speed 3377.73 samples/sec Loss 8.2558 LearningRate 0.0745 Epoch: 2 Global Step: 34010 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:41:04,103-Speed 3382.21 samples/sec Loss 8.2574 LearningRate 0.0745 Epoch: 2 Global Step: 34020 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:41:07,159-Speed 3351.69 samples/sec Loss 8.3538 LearningRate 0.0745 Epoch: 2 Global Step: 34030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:41:10,145-Speed 3430.52 samples/sec Loss 8.3682 LearningRate 0.0745 Epoch: 2 Global Step: 34040 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:41:13,155-Speed 3402.92 samples/sec Loss 8.3444 LearningRate 0.0745 Epoch: 2 Global Step: 34050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:41:16,173-Speed 3394.52 samples/sec Loss 8.3541 LearningRate 0.0745 Epoch: 2 Global Step: 34060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:41:19,177-Speed 3409.75 samples/sec Loss 8.2582 LearningRate 0.0745 Epoch: 2 Global Step: 34070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:41:22,183-Speed 3407.80 samples/sec Loss 8.2262 LearningRate 0.0744 Epoch: 2 Global Step: 34080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:41:25,220-Speed 3372.21 samples/sec Loss 8.4114 LearningRate 0.0744 Epoch: 2 Global Step: 34090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:41:28,251-Speed 3379.97 samples/sec Loss 8.1528 LearningRate 0.0744 Epoch: 2 Global Step: 34100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:41:31,314-Speed 3343.75 samples/sec Loss 8.3734 LearningRate 0.0744 Epoch: 2 Global Step: 34110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:41:34,393-Speed 3327.59 samples/sec Loss 8.2058 LearningRate 0.0744 Epoch: 2 Global Step: 34120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:41:37,437-Speed 3364.37 samples/sec Loss 8.5269 LearningRate 0.0744 Epoch: 2 Global Step: 34130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:41:40,462-Speed 3386.14 samples/sec Loss 8.2470 LearningRate 0.0744 Epoch: 2 Global Step: 34140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:41:43,588-Speed 3276.58 samples/sec Loss 8.3832 LearningRate 0.0744 Epoch: 2 Global Step: 34150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:41:46,631-Speed 3366.75 samples/sec Loss 8.2668 LearningRate 0.0744 Epoch: 2 Global Step: 34160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:41:49,666-Speed 3374.81 samples/sec Loss 8.2990 LearningRate 0.0744 Epoch: 2 Global Step: 34170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:41:52,721-Speed 3352.99 samples/sec Loss 8.2399 LearningRate 0.0744 Epoch: 2 Global Step: 34180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:41:55,757-Speed 3373.65 samples/sec Loss 8.2469 LearningRate 0.0744 Epoch: 2 Global Step: 34190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 04:41:58,775-Speed 3394.25 samples/sec Loss 8.2772 LearningRate 0.0744 Epoch: 2 Global Step: 34200 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:42:01,818-Speed 3366.78 samples/sec Loss 8.2007 LearningRate 0.0744 Epoch: 2 Global Step: 34210 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:42:04,890-Speed 3334.35 samples/sec Loss 8.2468 LearningRate 0.0743 Epoch: 2 Global Step: 34220 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:42:07,935-Speed 3363.96 samples/sec Loss 8.2625 LearningRate 0.0743 Epoch: 2 Global Step: 34230 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:42:10,972-Speed 3372.68 samples/sec Loss 8.2383 LearningRate 0.0743 Epoch: 2 Global Step: 34240 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:42:14,029-Speed 3351.20 samples/sec Loss 8.1914 LearningRate 0.0743 Epoch: 2 Global Step: 34250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:42:17,059-Speed 3381.26 samples/sec Loss 8.1325 LearningRate 0.0743 Epoch: 2 Global Step: 34260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:42:20,063-Speed 3409.65 samples/sec Loss 8.2265 LearningRate 0.0743 Epoch: 2 Global Step: 34270 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:42:23,111-Speed 3360.31 samples/sec Loss 8.4165 LearningRate 0.0743 Epoch: 2 Global Step: 34280 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:42:26,146-Speed 3375.33 samples/sec Loss 8.2860 LearningRate 0.0743 Epoch: 2 Global Step: 34290 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:42:29,203-Speed 3351.65 samples/sec Loss 8.3537 LearningRate 0.0743 Epoch: 2 Global Step: 34300 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:42:32,239-Speed 3373.18 samples/sec Loss 8.3032 LearningRate 0.0743 Epoch: 2 Global Step: 34310 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:42:35,295-Speed 3351.89 samples/sec Loss 8.2970 LearningRate 0.0743 Epoch: 2 Global Step: 34320 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:42:38,328-Speed 3377.38 samples/sec Loss 8.3953 LearningRate 0.0743 Epoch: 2 Global Step: 34330 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:42:41,366-Speed 3372.47 samples/sec Loss 8.1877 LearningRate 0.0743 Epoch: 2 Global Step: 34340 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:42:44,375-Speed 3403.27 samples/sec Loss 8.3181 LearningRate 0.0743 Epoch: 2 Global Step: 34350 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:42:47,423-Speed 3361.39 samples/sec Loss 8.3582 LearningRate 0.0743 Epoch: 2 Global Step: 34360 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:42:50,493-Speed 3335.46 samples/sec Loss 8.2583 LearningRate 0.0742 Epoch: 2 Global Step: 34370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:42:53,553-Speed 3348.09 samples/sec Loss 8.3155 LearningRate 0.0742 Epoch: 2 Global Step: 34380 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:42:56,599-Speed 3362.25 samples/sec Loss 8.2399 LearningRate 0.0742 Epoch: 2 Global Step: 34390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:42:59,623-Speed 3387.55 samples/sec Loss 8.1093 LearningRate 0.0742 Epoch: 2 Global Step: 34400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:43:02,667-Speed 3365.52 samples/sec Loss 8.2429 LearningRate 0.0742 Epoch: 2 Global Step: 34410 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:43:05,718-Speed 3356.96 samples/sec Loss 8.2910 LearningRate 0.0742 Epoch: 2 Global Step: 34420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:43:08,779-Speed 3346.71 samples/sec Loss 8.3560 LearningRate 0.0742 Epoch: 2 Global Step: 34430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:43:11,853-Speed 3332.11 samples/sec Loss 8.3457 LearningRate 0.0742 Epoch: 2 Global Step: 34440 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:43:14,918-Speed 3341.40 samples/sec Loss 8.2556 LearningRate 0.0742 Epoch: 2 Global Step: 34450 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:43:17,942-Speed 3388.10 samples/sec Loss 8.2818 LearningRate 0.0742 Epoch: 2 Global Step: 34460 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:43:20,986-Speed 3364.00 samples/sec Loss 8.4600 LearningRate 0.0742 Epoch: 2 Global Step: 34470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:43:24,005-Speed 3393.30 samples/sec Loss 8.2519 LearningRate 0.0742 Epoch: 2 Global Step: 34480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:43:27,080-Speed 3330.81 samples/sec Loss 8.2033 LearningRate 0.0742 Epoch: 2 Global Step: 34490 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:43:30,107-Speed 3384.67 samples/sec Loss 8.1610 LearningRate 0.0742 Epoch: 2 Global Step: 34500 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:43:33,126-Speed 3393.16 samples/sec Loss 8.2090 LearningRate 0.0741 Epoch: 2 Global Step: 34510 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:43:36,184-Speed 3349.49 samples/sec Loss 8.2292 LearningRate 0.0741 Epoch: 2 Global Step: 34520 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:43:39,237-Speed 3355.07 samples/sec Loss 8.3362 LearningRate 0.0741 Epoch: 2 Global Step: 34530 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:43:42,302-Speed 3341.39 samples/sec Loss 8.3138 LearningRate 0.0741 Epoch: 2 Global Step: 34540 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:43:45,317-Speed 3398.51 samples/sec Loss 8.1224 LearningRate 0.0741 Epoch: 2 Global Step: 34550 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:43:48,343-Speed 3384.73 samples/sec Loss 8.3033 LearningRate 0.0741 Epoch: 2 Global Step: 34560 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:43:51,428-Speed 3320.08 samples/sec Loss 8.2240 LearningRate 0.0741 Epoch: 2 Global Step: 34570 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:43:54,502-Speed 3331.74 samples/sec Loss 8.2858 LearningRate 0.0741 Epoch: 2 Global Step: 34580 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:43:57,512-Speed 3403.06 samples/sec Loss 8.4127 LearningRate 0.0741 Epoch: 2 Global Step: 34590 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:44:00,533-Speed 3390.58 samples/sec Loss 8.2108 LearningRate 0.0741 Epoch: 2 Global Step: 34600 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:44:03,569-Speed 3374.17 samples/sec Loss 8.2604 LearningRate 0.0741 Epoch: 2 Global Step: 34610 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:44:06,593-Speed 3387.17 samples/sec Loss 8.2213 LearningRate 0.0741 Epoch: 2 Global Step: 34620 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:44:09,615-Speed 3389.44 samples/sec Loss 8.2434 LearningRate 0.0741 Epoch: 2 Global Step: 34630 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:44:12,634-Speed 3392.72 samples/sec Loss 8.2263 LearningRate 0.0741 Epoch: 2 Global Step: 34640 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:44:15,653-Speed 3393.85 samples/sec Loss 8.2862 LearningRate 0.0740 Epoch: 2 Global Step: 34650 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:44:18,710-Speed 3350.72 samples/sec Loss 8.1698 LearningRate 0.0740 Epoch: 2 Global Step: 34660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:44:21,751-Speed 3368.39 samples/sec Loss 8.2497 LearningRate 0.0740 Epoch: 2 Global Step: 34670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:44:24,818-Speed 3339.17 samples/sec Loss 8.4007 LearningRate 0.0740 Epoch: 2 Global Step: 34680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:44:27,856-Speed 3372.00 samples/sec Loss 8.2585 LearningRate 0.0740 Epoch: 2 Global Step: 34690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:44:30,877-Speed 3391.19 samples/sec Loss 8.3248 LearningRate 0.0740 Epoch: 2 Global Step: 34700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:44:33,901-Speed 3387.70 samples/sec Loss 8.2938 LearningRate 0.0740 Epoch: 2 Global Step: 34710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:44:36,971-Speed 3335.73 samples/sec Loss 8.2615 LearningRate 0.0740 Epoch: 2 Global Step: 34720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:44:40,065-Speed 3310.73 samples/sec Loss 8.2660 LearningRate 0.0740 Epoch: 2 Global Step: 34730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:44:43,096-Speed 3378.91 samples/sec Loss 8.3541 LearningRate 0.0740 Epoch: 2 Global Step: 34740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:44:46,130-Speed 3376.46 samples/sec Loss 8.2802 LearningRate 0.0740 Epoch: 2 Global Step: 34750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:44:49,232-Speed 3302.88 samples/sec Loss 8.3199 LearningRate 0.0740 Epoch: 2 Global Step: 34760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:44:52,287-Speed 3351.90 samples/sec Loss 8.2078 LearningRate 0.0740 Epoch: 2 Global Step: 34770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:44:55,302-Speed 3397.25 samples/sec Loss 8.4577 LearningRate 0.0740 Epoch: 2 Global Step: 34780 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:44:58,336-Speed 3377.29 samples/sec Loss 8.2960 LearningRate 0.0740 Epoch: 2 Global Step: 34790 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:45:01,392-Speed 3351.52 samples/sec Loss 8.3752 LearningRate 0.0739 Epoch: 2 Global Step: 34800 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:45:04,438-Speed 3363.27 samples/sec Loss 8.2282 LearningRate 0.0739 Epoch: 2 Global Step: 34810 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:45:07,501-Speed 3343.97 samples/sec Loss 8.3956 LearningRate 0.0739 Epoch: 2 Global Step: 34820 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:45:10,528-Speed 3384.17 samples/sec Loss 8.3949 LearningRate 0.0739 Epoch: 2 Global Step: 34830 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:45:13,594-Speed 3340.54 samples/sec Loss 8.2631 LearningRate 0.0739 Epoch: 2 Global Step: 34840 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:45:16,644-Speed 3359.31 samples/sec Loss 8.0910 LearningRate 0.0739 Epoch: 2 Global Step: 34850 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:45:19,657-Speed 3398.82 samples/sec Loss 8.1904 LearningRate 0.0739 Epoch: 2 Global Step: 34860 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:45:22,742-Speed 3320.07 samples/sec Loss 8.2680 LearningRate 0.0739 Epoch: 2 Global Step: 34870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:45:25,792-Speed 3358.82 samples/sec Loss 8.3699 LearningRate 0.0739 Epoch: 2 Global Step: 34880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:45:28,916-Speed 3278.87 samples/sec Loss 8.3032 LearningRate 0.0739 Epoch: 2 Global Step: 34890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:45:31,985-Speed 3337.81 samples/sec Loss 8.3319 LearningRate 0.0739 Epoch: 2 Global Step: 34900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:45:35,036-Speed 3357.20 samples/sec Loss 8.2568 LearningRate 0.0739 Epoch: 2 Global Step: 34910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:45:38,131-Speed 3310.00 samples/sec Loss 8.1565 LearningRate 0.0739 Epoch: 2 Global Step: 34920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:45:41,225-Speed 3310.75 samples/sec Loss 8.3137 LearningRate 0.0739 Epoch: 2 Global Step: 34930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:45:44,240-Speed 3397.46 samples/sec Loss 8.1882 LearningRate 0.0738 Epoch: 2 Global Step: 34940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:45:47,286-Speed 3363.31 samples/sec Loss 8.3353 LearningRate 0.0738 Epoch: 2 Global Step: 34950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:45:50,349-Speed 3344.29 samples/sec Loss 8.2605 LearningRate 0.0738 Epoch: 2 Global Step: 34960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:45:53,363-Speed 3397.35 samples/sec Loss 8.3413 LearningRate 0.0738 Epoch: 2 Global Step: 34970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:45:56,359-Speed 3420.34 samples/sec Loss 8.2242 LearningRate 0.0738 Epoch: 2 Global Step: 34980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:45:59,416-Speed 3350.37 samples/sec Loss 8.2201 LearningRate 0.0738 Epoch: 2 Global Step: 34990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:46:02,494-Speed 3328.28 samples/sec Loss 8.3550 LearningRate 0.0738 Epoch: 2 Global Step: 35000 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:05,546-Speed 3355.92 samples/sec Loss 8.2981 LearningRate 0.0738 Epoch: 2 Global Step: 35010 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:08,590-Speed 3364.97 samples/sec Loss 8.2304 LearningRate 0.0738 Epoch: 2 Global Step: 35020 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:11,652-Speed 3345.47 samples/sec Loss 8.1427 LearningRate 0.0738 Epoch: 2 Global Step: 35030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:14,711-Speed 3348.42 samples/sec Loss 8.2083 LearningRate 0.0738 Epoch: 2 Global Step: 35040 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:17,733-Speed 3389.87 samples/sec Loss 8.1648 LearningRate 0.0738 Epoch: 2 Global Step: 35050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:20,789-Speed 3351.17 samples/sec Loss 8.1774 LearningRate 0.0738 Epoch: 2 Global Step: 35060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:23,841-Speed 3356.43 samples/sec Loss 8.0920 LearningRate 0.0738 Epoch: 2 Global Step: 35070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:26,866-Speed 3386.39 samples/sec Loss 8.2477 LearningRate 0.0738 Epoch: 2 Global Step: 35080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:29,946-Speed 3324.92 samples/sec Loss 8.1168 LearningRate 0.0737 Epoch: 2 Global Step: 35090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:32,974-Speed 3384.02 samples/sec Loss 8.3325 LearningRate 0.0737 Epoch: 2 Global Step: 35100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:46:36,044-Speed 3335.80 samples/sec Loss 8.2795 LearningRate 0.0737 Epoch: 2 Global Step: 35110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:39,640-Speed 2848.24 samples/sec Loss 8.1375 LearningRate 0.0737 Epoch: 2 Global Step: 35120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:42,773-Speed 3269.69 samples/sec Loss 8.2689 LearningRate 0.0737 Epoch: 2 Global Step: 35130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:45,797-Speed 3387.24 samples/sec Loss 8.1736 LearningRate 0.0737 Epoch: 2 Global Step: 35140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:48,875-Speed 3328.59 samples/sec Loss 8.2944 LearningRate 0.0737 Epoch: 2 Global Step: 35150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:51,933-Speed 3349.11 samples/sec Loss 8.1424 LearningRate 0.0737 Epoch: 2 Global Step: 35160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:54,985-Speed 3356.40 samples/sec Loss 8.2853 LearningRate 0.0737 Epoch: 2 Global Step: 35170 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:46:57,994-Speed 3403.73 samples/sec Loss 8.2548 LearningRate 0.0737 Epoch: 2 Global Step: 35180 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:01,123-Speed 3274.32 samples/sec Loss 8.3565 LearningRate 0.0737 Epoch: 2 Global Step: 35190 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:04,245-Speed 3280.33 samples/sec Loss 8.1812 LearningRate 0.0737 Epoch: 2 Global Step: 35200 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:07,284-Speed 3371.49 samples/sec Loss 8.1825 LearningRate 0.0737 Epoch: 2 Global Step: 35210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:47:10,290-Speed 3406.83 samples/sec Loss 8.1467 LearningRate 0.0737 Epoch: 2 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:47:13,389-Speed 3305.77 samples/sec Loss 8.2839 LearningRate 0.0736 Epoch: 2 Global Step: 35230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:47:16,464-Speed 3330.57 samples/sec Loss 8.2068 LearningRate 0.0736 Epoch: 2 Global Step: 35240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:47:19,491-Speed 3384.11 samples/sec Loss 8.2495 LearningRate 0.0736 Epoch: 2 Global Step: 35250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:22,520-Speed 3381.67 samples/sec Loss 8.2069 LearningRate 0.0736 Epoch: 2 Global Step: 35260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:25,563-Speed 3366.93 samples/sec Loss 8.2513 LearningRate 0.0736 Epoch: 2 Global Step: 35270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:28,680-Speed 3286.17 samples/sec Loss 8.1840 LearningRate 0.0736 Epoch: 2 Global Step: 35280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:31,739-Speed 3347.83 samples/sec Loss 8.2535 LearningRate 0.0736 Epoch: 2 Global Step: 35290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:34,778-Speed 3371.58 samples/sec Loss 8.2275 LearningRate 0.0736 Epoch: 2 Global Step: 35300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:37,889-Speed 3292.11 samples/sec Loss 8.1874 LearningRate 0.0736 Epoch: 2 Global Step: 35310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:40,930-Speed 3368.27 samples/sec Loss 8.2072 LearningRate 0.0736 Epoch: 2 Global Step: 35320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:43,935-Speed 3409.74 samples/sec Loss 8.1928 LearningRate 0.0736 Epoch: 2 Global Step: 35330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:46,992-Speed 3350.88 samples/sec Loss 8.1788 LearningRate 0.0736 Epoch: 2 Global Step: 35340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:47:50,087-Speed 3309.47 samples/sec Loss 8.1735 LearningRate 0.0736 Epoch: 2 Global Step: 35350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:47:53,106-Speed 3392.85 samples/sec Loss 8.3015 LearningRate 0.0736 Epoch: 2 Global Step: 35360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:47:56,133-Speed 3383.39 samples/sec Loss 8.1782 LearningRate 0.0736 Epoch: 2 Global Step: 35370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:47:59,255-Speed 3281.52 samples/sec Loss 8.2776 LearningRate 0.0735 Epoch: 2 Global Step: 35380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:48:02,368-Speed 3290.66 samples/sec Loss 8.2215 LearningRate 0.0735 Epoch: 2 Global Step: 35390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:48:05,418-Speed 3358.52 samples/sec Loss 8.2923 LearningRate 0.0735 Epoch: 2 Global Step: 35400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:48:08,426-Speed 3405.21 samples/sec Loss 8.1955 LearningRate 0.0735 Epoch: 2 Global Step: 35410 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:48:11,451-Speed 3386.08 samples/sec Loss 8.1580 LearningRate 0.0735 Epoch: 2 Global Step: 35420 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:48:16,400-Speed 2069.87 samples/sec Loss 8.2162 LearningRate 0.0735 Epoch: 2 Global Step: 35430 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:48:19,434-Speed 3375.37 samples/sec Loss 8.1766 LearningRate 0.0735 Epoch: 2 Global Step: 35440 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:48:22,498-Speed 3343.09 samples/sec Loss 8.1660 LearningRate 0.0735 Epoch: 2 Global Step: 35450 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:48:25,609-Speed 3293.02 samples/sec Loss 8.3043 LearningRate 0.0735 Epoch: 2 Global Step: 35460 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:48:28,631-Speed 3389.23 samples/sec Loss 8.2330 LearningRate 0.0735 Epoch: 2 Global Step: 35470 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:48:31,713-Speed 3323.60 samples/sec Loss 8.1545 LearningRate 0.0735 Epoch: 2 Global Step: 35480 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:48:34,757-Speed 3365.99 samples/sec Loss 8.2446 LearningRate 0.0735 Epoch: 2 Global Step: 35490 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:48:37,776-Speed 3392.97 samples/sec Loss 8.1876 LearningRate 0.0735 Epoch: 2 Global Step: 35500 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:48:40,837-Speed 3345.71 samples/sec Loss 8.2810 LearningRate 0.0735 Epoch: 2 Global Step: 35510 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:48:43,892-Speed 3352.91 samples/sec Loss 8.2393 LearningRate 0.0734 Epoch: 2 Global Step: 35520 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:48:46,957-Speed 3341.97 samples/sec Loss 8.4077 LearningRate 0.0734 Epoch: 2 Global Step: 35530 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:48:50,018-Speed 3346.54 samples/sec Loss 8.2572 LearningRate 0.0734 Epoch: 2 Global Step: 35540 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:48:53,150-Speed 3270.22 samples/sec Loss 8.1660 LearningRate 0.0734 Epoch: 2 Global Step: 35550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:48:56,193-Speed 3366.49 samples/sec Loss 8.1280 LearningRate 0.0734 Epoch: 2 Global Step: 35560 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:48:59,201-Speed 3405.55 samples/sec Loss 8.1137 LearningRate 0.0734 Epoch: 2 Global Step: 35570 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:02,239-Speed 3371.32 samples/sec Loss 8.0892 LearningRate 0.0734 Epoch: 2 Global Step: 35580 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:05,295-Speed 3351.64 samples/sec Loss 8.2026 LearningRate 0.0734 Epoch: 2 Global Step: 35590 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:08,306-Speed 3401.86 samples/sec Loss 8.1216 LearningRate 0.0734 Epoch: 2 Global Step: 35600 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:11,326-Speed 3391.38 samples/sec Loss 8.2234 LearningRate 0.0734 Epoch: 2 Global Step: 35610 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:14,423-Speed 3308.48 samples/sec Loss 8.2962 LearningRate 0.0734 Epoch: 2 Global Step: 35620 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:17,514-Speed 3313.71 samples/sec Loss 8.2567 LearningRate 0.0734 Epoch: 2 Global Step: 35630 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:20,550-Speed 3373.66 samples/sec Loss 8.1400 LearningRate 0.0734 Epoch: 2 Global Step: 35640 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:23,572-Speed 3389.07 samples/sec Loss 8.0452 LearningRate 0.0734 Epoch: 2 Global Step: 35650 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:26,660-Speed 3317.64 samples/sec Loss 8.2012 LearningRate 0.0734 Epoch: 2 Global Step: 35660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:29,691-Speed 3379.64 samples/sec Loss 8.0920 LearningRate 0.0733 Epoch: 2 Global Step: 35670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:32,705-Speed 3398.73 samples/sec Loss 8.0894 LearningRate 0.0733 Epoch: 2 Global Step: 35680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:35,727-Speed 3389.19 samples/sec Loss 8.2822 LearningRate 0.0733 Epoch: 2 Global Step: 35690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:38,748-Speed 3390.57 samples/sec Loss 8.1913 LearningRate 0.0733 Epoch: 2 Global Step: 35700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:41,801-Speed 3355.47 samples/sec Loss 8.2147 LearningRate 0.0733 Epoch: 2 Global Step: 35710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:49:44,849-Speed 3360.64 samples/sec Loss 8.2333 LearningRate 0.0733 Epoch: 2 Global Step: 35720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:49:47,967-Speed 3285.88 samples/sec Loss 8.2121 LearningRate 0.0733 Epoch: 2 Global Step: 35730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:49:51,048-Speed 3324.65 samples/sec Loss 8.1845 LearningRate 0.0733 Epoch: 2 Global Step: 35740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:49:54,113-Speed 3341.22 samples/sec Loss 8.1100 LearningRate 0.0733 Epoch: 2 Global Step: 35750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:49:57,136-Speed 3388.93 samples/sec Loss 8.3004 LearningRate 0.0733 Epoch: 2 Global Step: 35760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:50:00,225-Speed 3315.97 samples/sec Loss 8.3301 LearningRate 0.0733 Epoch: 2 Global Step: 35770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:50:03,310-Speed 3320.49 samples/sec Loss 8.2476 LearningRate 0.0733 Epoch: 2 Global Step: 35780 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:06,364-Speed 3353.66 samples/sec Loss 8.2583 LearningRate 0.0733 Epoch: 2 Global Step: 35790 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:09,417-Speed 3356.23 samples/sec Loss 8.0862 LearningRate 0.0733 Epoch: 2 Global Step: 35800 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:12,488-Speed 3334.97 samples/sec Loss 8.1399 LearningRate 0.0732 Epoch: 2 Global Step: 35810 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:15,513-Speed 3386.29 samples/sec Loss 8.2299 LearningRate 0.0732 Epoch: 2 Global Step: 35820 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:18,535-Speed 3389.03 samples/sec Loss 8.2712 LearningRate 0.0732 Epoch: 2 Global Step: 35830 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:21,551-Speed 3397.19 samples/sec Loss 8.3024 LearningRate 0.0732 Epoch: 2 Global Step: 35840 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:24,621-Speed 3336.79 samples/sec Loss 8.2037 LearningRate 0.0732 Epoch: 2 Global Step: 35850 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:27,688-Speed 3340.46 samples/sec Loss 8.1652 LearningRate 0.0732 Epoch: 2 Global Step: 35860 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:30,732-Speed 3365.01 samples/sec Loss 8.2979 LearningRate 0.0732 Epoch: 2 Global Step: 35870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:33,767-Speed 3374.32 samples/sec Loss 8.0774 LearningRate 0.0732 Epoch: 2 Global Step: 35880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:50:36,820-Speed 3356.50 samples/sec Loss 8.2712 LearningRate 0.0732 Epoch: 2 Global Step: 35890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:39,879-Speed 3348.69 samples/sec Loss 8.2077 LearningRate 0.0732 Epoch: 2 Global Step: 35900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:42,952-Speed 3333.28 samples/sec Loss 8.2570 LearningRate 0.0732 Epoch: 2 Global Step: 35910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:45,966-Speed 3398.18 samples/sec Loss 8.1519 LearningRate 0.0732 Epoch: 2 Global Step: 35920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:48,981-Speed 3397.46 samples/sec Loss 8.1826 LearningRate 0.0732 Epoch: 2 Global Step: 35930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:52,100-Speed 3283.45 samples/sec Loss 8.0990 LearningRate 0.0732 Epoch: 2 Global Step: 35940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:55,183-Speed 3323.69 samples/sec Loss 8.1283 LearningRate 0.0732 Epoch: 2 Global Step: 35950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:50:58,200-Speed 3394.95 samples/sec Loss 8.2236 LearningRate 0.0731 Epoch: 2 Global Step: 35960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:01,233-Speed 3377.03 samples/sec Loss 8.2368 LearningRate 0.0731 Epoch: 2 Global Step: 35970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:04,300-Speed 3340.38 samples/sec Loss 8.1904 LearningRate 0.0731 Epoch: 2 Global Step: 35980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:07,338-Speed 3371.51 samples/sec Loss 8.2688 LearningRate 0.0731 Epoch: 2 Global Step: 35990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:51:10,340-Speed 3411.78 samples/sec Loss 8.1452 LearningRate 0.0731 Epoch: 2 Global Step: 36000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:51:13,409-Speed 3338.03 samples/sec Loss 8.1914 LearningRate 0.0731 Epoch: 2 Global Step: 36010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:51:16,417-Speed 3405.14 samples/sec Loss 8.3227 LearningRate 0.0731 Epoch: 2 Global Step: 36020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:51:19,438-Speed 3391.63 samples/sec Loss 8.1554 LearningRate 0.0731 Epoch: 2 Global Step: 36030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:22,471-Speed 3376.41 samples/sec Loss 8.2160 LearningRate 0.0731 Epoch: 2 Global Step: 36040 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:25,551-Speed 3326.77 samples/sec Loss 8.2507 LearningRate 0.0731 Epoch: 2 Global Step: 36050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:28,565-Speed 3398.48 samples/sec Loss 8.1929 LearningRate 0.0731 Epoch: 2 Global Step: 36060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:31,662-Speed 3306.90 samples/sec Loss 8.2300 LearningRate 0.0731 Epoch: 2 Global Step: 36070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:34,686-Speed 3387.99 samples/sec Loss 8.1084 LearningRate 0.0731 Epoch: 2 Global Step: 36080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:37,753-Speed 3339.01 samples/sec Loss 8.0509 LearningRate 0.0731 Epoch: 2 Global Step: 36090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:40,810-Speed 3351.23 samples/sec Loss 8.1253 LearningRate 0.0730 Epoch: 2 Global Step: 36100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:43,880-Speed 3336.20 samples/sec Loss 8.1094 LearningRate 0.0730 Epoch: 2 Global Step: 36110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:46,918-Speed 3372.53 samples/sec Loss 8.1796 LearningRate 0.0730 Epoch: 2 Global Step: 36120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:51:49,942-Speed 3386.68 samples/sec Loss 8.1634 LearningRate 0.0730 Epoch: 2 Global Step: 36130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:51:53,047-Speed 3299.65 samples/sec Loss 8.1585 LearningRate 0.0730 Epoch: 2 Global Step: 36140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:51:56,119-Speed 3333.96 samples/sec Loss 8.1443 LearningRate 0.0730 Epoch: 2 Global Step: 36150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:51:59,159-Speed 3368.91 samples/sec Loss 8.1658 LearningRate 0.0730 Epoch: 2 Global Step: 36160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:52:02,253-Speed 3311.17 samples/sec Loss 8.1140 LearningRate 0.0730 Epoch: 2 Global Step: 36170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:52:05,302-Speed 3359.16 samples/sec Loss 8.0717 LearningRate 0.0730 Epoch: 2 Global Step: 36180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:52:08,320-Speed 3394.69 samples/sec Loss 8.1851 LearningRate 0.0730 Epoch: 2 Global Step: 36190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:52:11,378-Speed 3349.55 samples/sec Loss 8.0545 LearningRate 0.0730 Epoch: 2 Global Step: 36200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:52:14,440-Speed 3345.62 samples/sec Loss 8.0724 LearningRate 0.0730 Epoch: 2 Global Step: 36210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:52:17,524-Speed 3321.89 samples/sec Loss 8.0163 LearningRate 0.0730 Epoch: 2 Global Step: 36220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:52:20,568-Speed 3364.73 samples/sec Loss 8.1979 LearningRate 0.0730 Epoch: 2 Global Step: 36230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 04:52:23,560-Speed 3423.68 samples/sec Loss 8.1399 LearningRate 0.0730 Epoch: 2 Global Step: 36240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:52:26,690-Speed 3273.23 samples/sec Loss 8.1455 LearningRate 0.0729 Epoch: 2 Global Step: 36250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:52:29,707-Speed 3394.74 samples/sec Loss 8.0190 LearningRate 0.0729 Epoch: 2 Global Step: 36260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:52:32,719-Speed 3400.74 samples/sec Loss 8.2482 LearningRate 0.0729 Epoch: 2 Global Step: 36270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:52:35,747-Speed 3383.58 samples/sec Loss 8.0631 LearningRate 0.0729 Epoch: 2 Global Step: 36280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:52:38,777-Speed 3381.21 samples/sec Loss 8.0941 LearningRate 0.0729 Epoch: 2 Global Step: 36290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:52:41,850-Speed 3332.55 samples/sec Loss 8.1935 LearningRate 0.0729 Epoch: 2 Global Step: 36300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:52:44,894-Speed 3364.54 samples/sec Loss 8.0874 LearningRate 0.0729 Epoch: 2 Global Step: 36310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:52:47,942-Speed 3361.73 samples/sec Loss 8.1215 LearningRate 0.0729 Epoch: 2 Global Step: 36320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:52:50,961-Speed 3392.72 samples/sec Loss 8.1867 LearningRate 0.0729 Epoch: 2 Global Step: 36330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:52:53,975-Speed 3399.33 samples/sec Loss 8.2645 LearningRate 0.0729 Epoch: 2 Global Step: 36340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:52:57,003-Speed 3382.99 samples/sec Loss 8.0567 LearningRate 0.0729 Epoch: 2 Global Step: 36350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:53:00,078-Speed 3330.66 samples/sec Loss 8.0956 LearningRate 0.0729 Epoch: 2 Global Step: 36360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:53:03,093-Speed 3397.40 samples/sec Loss 7.9307 LearningRate 0.0729 Epoch: 2 Global Step: 36370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:53:06,098-Speed 3408.76 samples/sec Loss 8.2209 LearningRate 0.0729 Epoch: 2 Global Step: 36380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:53:09,121-Speed 3388.43 samples/sec Loss 7.9954 LearningRate 0.0728 Epoch: 2 Global Step: 36390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:53:12,131-Speed 3403.74 samples/sec Loss 8.1259 LearningRate 0.0728 Epoch: 2 Global Step: 36400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:53:15,213-Speed 3323.82 samples/sec Loss 8.2049 LearningRate 0.0728 Epoch: 2 Global Step: 36410 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:53:18,229-Speed 3396.95 samples/sec Loss 8.2033 LearningRate 0.0728 Epoch: 2 Global Step: 36420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:53:21,280-Speed 3357.52 samples/sec Loss 8.2099 LearningRate 0.0728 Epoch: 2 Global Step: 36430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:53:24,334-Speed 3354.03 samples/sec Loss 8.1099 LearningRate 0.0728 Epoch: 2 Global Step: 36440 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:53:27,378-Speed 3365.04 samples/sec Loss 8.0891 LearningRate 0.0728 Epoch: 2 Global Step: 36450 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:53:30,405-Speed 3383.74 samples/sec Loss 8.1366 LearningRate 0.0728 Epoch: 2 Global Step: 36460 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:53:33,420-Speed 3397.49 samples/sec Loss 8.1498 LearningRate 0.0728 Epoch: 2 Global Step: 36470 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:53:36,449-Speed 3382.17 samples/sec Loss 7.9756 LearningRate 0.0728 Epoch: 2 Global Step: 36480 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:53:39,584-Speed 3266.66 samples/sec Loss 8.0520 LearningRate 0.0728 Epoch: 2 Global Step: 36490 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:53:42,673-Speed 3316.43 samples/sec Loss 8.0392 LearningRate 0.0728 Epoch: 2 Global Step: 36500 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:53:45,704-Speed 3379.88 samples/sec Loss 8.1757 LearningRate 0.0728 Epoch: 2 Global Step: 36510 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:53:48,773-Speed 3337.35 samples/sec Loss 8.0913 LearningRate 0.0728 Epoch: 2 Global Step: 36520 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:53:51,852-Speed 3327.19 samples/sec Loss 8.1665 LearningRate 0.0728 Epoch: 2 Global Step: 36530 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:53:54,876-Speed 3387.42 samples/sec Loss 8.1906 LearningRate 0.0727 Epoch: 2 Global Step: 36540 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:53:57,924-Speed 3360.53 samples/sec Loss 8.0870 LearningRate 0.0727 Epoch: 2 Global Step: 36550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:54:00,956-Speed 3378.55 samples/sec Loss 8.1727 LearningRate 0.0727 Epoch: 2 Global Step: 36560 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:54:04,041-Speed 3320.48 samples/sec Loss 8.1201 LearningRate 0.0727 Epoch: 2 Global Step: 36570 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:54:07,079-Speed 3371.87 samples/sec Loss 8.0214 LearningRate 0.0727 Epoch: 2 Global Step: 36580 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:54:10,127-Speed 3361.01 samples/sec Loss 8.0938 LearningRate 0.0727 Epoch: 2 Global Step: 36590 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:54:13,125-Speed 3416.44 samples/sec Loss 8.1319 LearningRate 0.0727 Epoch: 2 Global Step: 36600 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:54:16,254-Speed 3273.39 samples/sec Loss 8.1086 LearningRate 0.0727 Epoch: 2 Global Step: 36610 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:54:19,290-Speed 3374.46 samples/sec Loss 8.1478 LearningRate 0.0727 Epoch: 2 Global Step: 36620 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:54:22,293-Speed 3411.39 samples/sec Loss 8.1995 LearningRate 0.0727 Epoch: 2 Global Step: 36630 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:54:25,312-Speed 3392.13 samples/sec Loss 8.1454 LearningRate 0.0727 Epoch: 2 Global Step: 36640 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:54:28,340-Speed 3383.73 samples/sec Loss 7.9863 LearningRate 0.0727 Epoch: 2 Global Step: 36650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:54:31,401-Speed 3346.02 samples/sec Loss 8.1837 LearningRate 0.0727 Epoch: 2 Global Step: 36660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:54:34,418-Speed 3395.19 samples/sec Loss 8.2089 LearningRate 0.0727 Epoch: 2 Global Step: 36670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:54:37,453-Speed 3374.63 samples/sec Loss 8.1603 LearningRate 0.0726 Epoch: 2 Global Step: 36680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:54:40,456-Speed 3410.65 samples/sec Loss 8.1061 LearningRate 0.0726 Epoch: 2 Global Step: 36690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:54:43,524-Speed 3339.75 samples/sec Loss 8.2177 LearningRate 0.0726 Epoch: 2 Global Step: 36700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:54:46,527-Speed 3411.00 samples/sec Loss 8.2093 LearningRate 0.0726 Epoch: 2 Global Step: 36710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:54:49,658-Speed 3271.45 samples/sec Loss 8.2006 LearningRate 0.0726 Epoch: 2 Global Step: 36720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:54:52,683-Speed 3386.43 samples/sec Loss 8.2017 LearningRate 0.0726 Epoch: 2 Global Step: 36730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:54:55,729-Speed 3362.58 samples/sec Loss 8.1259 LearningRate 0.0726 Epoch: 2 Global Step: 36740 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:54:58,747-Speed 3394.19 samples/sec Loss 8.0456 LearningRate 0.0726 Epoch: 2 Global Step: 36750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:55:01,767-Speed 3391.41 samples/sec Loss 8.0217 LearningRate 0.0726 Epoch: 2 Global Step: 36760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:55:04,846-Speed 3327.20 samples/sec Loss 7.9517 LearningRate 0.0726 Epoch: 2 Global Step: 36770 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:55:07,886-Speed 3369.41 samples/sec Loss 8.0379 LearningRate 0.0726 Epoch: 2 Global Step: 36780 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:55:10,919-Speed 3376.30 samples/sec Loss 8.2551 LearningRate 0.0726 Epoch: 2 Global Step: 36790 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:55:13,990-Speed 3336.61 samples/sec Loss 8.0579 LearningRate 0.0726 Epoch: 2 Global Step: 36800 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:55:17,017-Speed 3383.27 samples/sec Loss 8.0713 LearningRate 0.0726 Epoch: 2 Global Step: 36810 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:55:20,098-Speed 3324.82 samples/sec Loss 8.0705 LearningRate 0.0726 Epoch: 2 Global Step: 36820 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:55:23,131-Speed 3377.29 samples/sec Loss 8.1302 LearningRate 0.0725 Epoch: 2 Global Step: 36830 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:55:26,154-Speed 3388.35 samples/sec Loss 8.1249 LearningRate 0.0725 Epoch: 2 Global Step: 36840 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:55:29,218-Speed 3343.50 samples/sec Loss 8.0824 LearningRate 0.0725 Epoch: 2 Global Step: 36850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:55:32,301-Speed 3322.98 samples/sec Loss 8.0607 LearningRate 0.0725 Epoch: 2 Global Step: 36860 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:55:35,381-Speed 3325.69 samples/sec Loss 8.0761 LearningRate 0.0725 Epoch: 2 Global Step: 36870 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:55:38,423-Speed 3367.42 samples/sec Loss 8.1426 LearningRate 0.0725 Epoch: 2 Global Step: 36880 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:55:41,448-Speed 3385.66 samples/sec Loss 8.1852 LearningRate 0.0725 Epoch: 2 Global Step: 36890 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:55:44,486-Speed 3372.22 samples/sec Loss 8.1795 LearningRate 0.0725 Epoch: 2 Global Step: 36900 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:55:47,518-Speed 3377.95 samples/sec Loss 8.2937 LearningRate 0.0725 Epoch: 2 Global Step: 36910 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:55:50,524-Speed 3406.83 samples/sec Loss 8.0160 LearningRate 0.0725 Epoch: 2 Global Step: 36920 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:55:53,544-Speed 3392.69 samples/sec Loss 8.0851 LearningRate 0.0725 Epoch: 2 Global Step: 36930 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:55:56,536-Speed 3423.62 samples/sec Loss 8.0984 LearningRate 0.0725 Epoch: 2 Global Step: 36940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:55:59,580-Speed 3364.33 samples/sec Loss 8.0518 LearningRate 0.0725 Epoch: 2 Global Step: 36950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:56:02,603-Speed 3388.53 samples/sec Loss 8.0906 LearningRate 0.0725 Epoch: 2 Global Step: 36960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:56:05,619-Speed 3397.43 samples/sec Loss 8.0705 LearningRate 0.0725 Epoch: 2 Global Step: 36970 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:56:08,606-Speed 3429.14 samples/sec Loss 8.1035 LearningRate 0.0724 Epoch: 2 Global Step: 36980 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:56:11,660-Speed 3353.69 samples/sec Loss 8.1113 LearningRate 0.0724 Epoch: 2 Global Step: 36990 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:56:14,701-Speed 3367.82 samples/sec Loss 8.0413 LearningRate 0.0724 Epoch: 2 Global Step: 37000 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:56:17,766-Speed 3342.79 samples/sec Loss 8.0781 LearningRate 0.0724 Epoch: 2 Global Step: 37010 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:56:20,784-Speed 3393.26 samples/sec Loss 8.1792 LearningRate 0.0724 Epoch: 2 Global Step: 37020 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:56:23,843-Speed 3348.75 samples/sec Loss 8.1658 LearningRate 0.0724 Epoch: 2 Global Step: 37030 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:56:26,881-Speed 3372.38 samples/sec Loss 8.1618 LearningRate 0.0724 Epoch: 2 Global Step: 37040 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:56:29,922-Speed 3367.90 samples/sec Loss 8.1088 LearningRate 0.0724 Epoch: 2 Global Step: 37050 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:56:32,967-Speed 3364.12 samples/sec Loss 8.1203 LearningRate 0.0724 Epoch: 2 Global Step: 37060 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 04:56:36,064-Speed 3307.94 samples/sec Loss 8.0484 LearningRate 0.0724 Epoch: 2 Global Step: 37070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:56:39,129-Speed 3341.76 samples/sec Loss 8.0611 LearningRate 0.0724 Epoch: 2 Global Step: 37080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:56:42,146-Speed 3395.19 samples/sec Loss 8.1263 LearningRate 0.0724 Epoch: 2 Global Step: 37090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:56:45,154-Speed 3405.46 samples/sec Loss 8.3563 LearningRate 0.0724 Epoch: 2 Global Step: 37100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:56:48,170-Speed 3396.21 samples/sec Loss 8.1166 LearningRate 0.0724 Epoch: 2 Global Step: 37110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:56:51,212-Speed 3367.42 samples/sec Loss 8.0985 LearningRate 0.0723 Epoch: 2 Global Step: 37120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:56:54,241-Speed 3381.29 samples/sec Loss 8.0528 LearningRate 0.0723 Epoch: 2 Global Step: 37130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:56:57,288-Speed 3361.86 samples/sec Loss 7.9912 LearningRate 0.0723 Epoch: 2 Global Step: 37140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:57:00,288-Speed 3414.29 samples/sec Loss 8.0117 LearningRate 0.0723 Epoch: 2 Global Step: 37150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:57:03,333-Speed 3364.94 samples/sec Loss 8.0628 LearningRate 0.0723 Epoch: 2 Global Step: 37160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:57:06,412-Speed 3326.41 samples/sec Loss 8.1157 LearningRate 0.0723 Epoch: 2 Global Step: 37170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:57:09,410-Speed 3416.23 samples/sec Loss 8.0448 LearningRate 0.0723 Epoch: 2 Global Step: 37180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:57:12,446-Speed 3374.81 samples/sec Loss 8.1454 LearningRate 0.0723 Epoch: 2 Global Step: 37190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:57:15,541-Speed 3309.73 samples/sec Loss 8.1489 LearningRate 0.0723 Epoch: 2 Global Step: 37200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:57:18,551-Speed 3402.38 samples/sec Loss 8.0277 LearningRate 0.0723 Epoch: 2 Global Step: 37210 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:57:21,574-Speed 3388.91 samples/sec Loss 7.8973 LearningRate 0.0723 Epoch: 2 Global Step: 37220 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:57:24,604-Speed 3379.98 samples/sec Loss 8.0044 LearningRate 0.0723 Epoch: 2 Global Step: 37230 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:57:27,636-Speed 3379.17 samples/sec Loss 8.0340 LearningRate 0.0723 Epoch: 2 Global Step: 37240 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:57:30,657-Speed 3390.23 samples/sec Loss 8.2785 LearningRate 0.0723 Epoch: 2 Global Step: 37250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:57:33,884-Speed 3173.97 samples/sec Loss 8.0983 LearningRate 0.0723 Epoch: 2 Global Step: 37260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:05,237-Speed 326.62 samples/sec Loss 6.9871 LearningRate 0.0722 Epoch: 3 Global Step: 37270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:08,737-Speed 2926.75 samples/sec Loss 6.5843 LearningRate 0.0722 Epoch: 3 Global Step: 37280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:11,783-Speed 3362.14 samples/sec Loss 6.4734 LearningRate 0.0722 Epoch: 3 Global Step: 37290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:14,891-Speed 3296.55 samples/sec Loss 6.3784 LearningRate 0.0722 Epoch: 3 Global Step: 37300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:17,949-Speed 3350.09 samples/sec Loss 6.4153 LearningRate 0.0722 Epoch: 3 Global Step: 37310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:58:20,963-Speed 3398.12 samples/sec Loss 6.5037 LearningRate 0.0722 Epoch: 3 Global Step: 37320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:24,038-Speed 3330.64 samples/sec Loss 6.4025 LearningRate 0.0722 Epoch: 3 Global Step: 37330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:27,055-Speed 3395.70 samples/sec Loss 6.4574 LearningRate 0.0722 Epoch: 3 Global Step: 37340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:30,061-Speed 3407.11 samples/sec Loss 6.4113 LearningRate 0.0722 Epoch: 3 Global Step: 37350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:33,079-Speed 3394.87 samples/sec Loss 6.3736 LearningRate 0.0722 Epoch: 3 Global Step: 37360 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:36,116-Speed 3372.99 samples/sec Loss 6.4045 LearningRate 0.0722 Epoch: 3 Global Step: 37370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:39,166-Speed 3357.97 samples/sec Loss 6.3196 LearningRate 0.0722 Epoch: 3 Global Step: 37380 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:42,262-Speed 3309.05 samples/sec Loss 6.4262 LearningRate 0.0722 Epoch: 3 Global Step: 37390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:45,261-Speed 3415.55 samples/sec Loss 6.5546 LearningRate 0.0722 Epoch: 3 Global Step: 37400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:48,280-Speed 3392.60 samples/sec Loss 6.4501 LearningRate 0.0721 Epoch: 3 Global Step: 37410 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:58:51,358-Speed 3328.15 samples/sec Loss 6.4731 LearningRate 0.0721 Epoch: 3 Global Step: 37420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:58:54,382-Speed 3387.21 samples/sec Loss 6.4842 LearningRate 0.0721 Epoch: 3 Global Step: 37430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:58:57,381-Speed 3415.77 samples/sec Loss 6.4561 LearningRate 0.0721 Epoch: 3 Global Step: 37440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:00,441-Speed 3347.77 samples/sec Loss 6.4149 LearningRate 0.0721 Epoch: 3 Global Step: 37450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:03,476-Speed 3374.81 samples/sec Loss 6.5143 LearningRate 0.0721 Epoch: 3 Global Step: 37460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:06,535-Speed 3347.98 samples/sec Loss 6.3945 LearningRate 0.0721 Epoch: 3 Global Step: 37470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:09,556-Speed 3391.13 samples/sec Loss 6.4545 LearningRate 0.0721 Epoch: 3 Global Step: 37480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:12,561-Speed 3408.16 samples/sec Loss 6.4055 LearningRate 0.0721 Epoch: 3 Global Step: 37490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:15,646-Speed 3320.68 samples/sec Loss 6.5315 LearningRate 0.0721 Epoch: 3 Global Step: 37500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:18,731-Speed 3319.98 samples/sec Loss 6.5518 LearningRate 0.0721 Epoch: 3 Global Step: 37510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:21,768-Speed 3373.46 samples/sec Loss 6.5634 LearningRate 0.0721 Epoch: 3 Global Step: 37520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 04:59:24,811-Speed 3365.53 samples/sec Loss 6.5250 LearningRate 0.0721 Epoch: 3 Global Step: 37530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:27,861-Speed 3358.94 samples/sec Loss 6.4222 LearningRate 0.0721 Epoch: 3 Global Step: 37540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:30,926-Speed 3341.62 samples/sec Loss 6.4483 LearningRate 0.0721 Epoch: 3 Global Step: 37550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:33,965-Speed 3371.02 samples/sec Loss 6.5204 LearningRate 0.0720 Epoch: 3 Global Step: 37560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:37,025-Speed 3347.34 samples/sec Loss 6.4790 LearningRate 0.0720 Epoch: 3 Global Step: 37570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:40,088-Speed 3343.99 samples/sec Loss 6.5234 LearningRate 0.0720 Epoch: 3 Global Step: 37580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:43,127-Speed 3370.37 samples/sec Loss 6.5625 LearningRate 0.0720 Epoch: 3 Global Step: 37590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:46,192-Speed 3342.83 samples/sec Loss 6.5597 LearningRate 0.0720 Epoch: 3 Global Step: 37600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 04:59:49,254-Speed 3344.92 samples/sec Loss 6.5054 LearningRate 0.0720 Epoch: 3 Global Step: 37610 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:59:52,305-Speed 3357.86 samples/sec Loss 6.6064 LearningRate 0.0720 Epoch: 3 Global Step: 37620 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:59:55,366-Speed 3346.08 samples/sec Loss 6.5634 LearningRate 0.0720 Epoch: 3 Global Step: 37630 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 04:59:58,393-Speed 3384.28 samples/sec Loss 6.5832 LearningRate 0.0720 Epoch: 3 Global Step: 37640 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:01,446-Speed 3355.23 samples/sec Loss 6.5040 LearningRate 0.0720 Epoch: 3 Global Step: 37650 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:04,491-Speed 3363.23 samples/sec Loss 6.5711 LearningRate 0.0720 Epoch: 3 Global Step: 37660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:07,541-Speed 3359.07 samples/sec Loss 6.4350 LearningRate 0.0720 Epoch: 3 Global Step: 37670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:10,547-Speed 3407.52 samples/sec Loss 6.5994 LearningRate 0.0720 Epoch: 3 Global Step: 37680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:13,608-Speed 3346.21 samples/sec Loss 6.5329 LearningRate 0.0720 Epoch: 3 Global Step: 37690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:16,651-Speed 3365.87 samples/sec Loss 6.5574 LearningRate 0.0720 Epoch: 3 Global Step: 37700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:19,731-Speed 3326.21 samples/sec Loss 6.6449 LearningRate 0.0719 Epoch: 3 Global Step: 37710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:00:22,767-Speed 3374.04 samples/sec Loss 6.6470 LearningRate 0.0719 Epoch: 3 Global Step: 37720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:00:25,859-Speed 3312.21 samples/sec Loss 6.6696 LearningRate 0.0719 Epoch: 3 Global Step: 37730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:00:28,907-Speed 3360.48 samples/sec Loss 6.5607 LearningRate 0.0719 Epoch: 3 Global Step: 37740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:00:31,947-Speed 3370.15 samples/sec Loss 6.6684 LearningRate 0.0719 Epoch: 3 Global Step: 37750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:34,958-Speed 3402.02 samples/sec Loss 6.6582 LearningRate 0.0719 Epoch: 3 Global Step: 37760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:37,961-Speed 3410.95 samples/sec Loss 6.7519 LearningRate 0.0719 Epoch: 3 Global Step: 37770 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:40,994-Speed 3377.01 samples/sec Loss 6.6867 LearningRate 0.0719 Epoch: 3 Global Step: 37780 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:44,036-Speed 3366.60 samples/sec Loss 6.5871 LearningRate 0.0719 Epoch: 3 Global Step: 37790 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:47,048-Speed 3401.34 samples/sec Loss 6.7045 LearningRate 0.0719 Epoch: 3 Global Step: 37800 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:50,083-Speed 3374.72 samples/sec Loss 6.6907 LearningRate 0.0719 Epoch: 3 Global Step: 37810 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:53,165-Speed 3323.65 samples/sec Loss 6.6073 LearningRate 0.0719 Epoch: 3 Global Step: 37820 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:56,190-Speed 3386.66 samples/sec Loss 6.6567 LearningRate 0.0719 Epoch: 3 Global Step: 37830 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:00:59,234-Speed 3364.89 samples/sec Loss 6.5275 LearningRate 0.0719 Epoch: 3 Global Step: 37840 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:02,313-Speed 3326.69 samples/sec Loss 6.6567 LearningRate 0.0718 Epoch: 3 Global Step: 37850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:01:05,392-Speed 3327.59 samples/sec Loss 6.6778 LearningRate 0.0718 Epoch: 3 Global Step: 37860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:01:08,417-Speed 3385.92 samples/sec Loss 6.8308 LearningRate 0.0718 Epoch: 3 Global Step: 37870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:11,428-Speed 3401.26 samples/sec Loss 6.6722 LearningRate 0.0718 Epoch: 3 Global Step: 37880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:14,493-Speed 3342.32 samples/sec Loss 6.5297 LearningRate 0.0718 Epoch: 3 Global Step: 37890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:17,600-Speed 3297.29 samples/sec Loss 6.6083 LearningRate 0.0718 Epoch: 3 Global Step: 37900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:20,612-Speed 3400.47 samples/sec Loss 6.6815 LearningRate 0.0718 Epoch: 3 Global Step: 37910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:23,628-Speed 3396.09 samples/sec Loss 6.6927 LearningRate 0.0718 Epoch: 3 Global Step: 37920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:26,721-Speed 3311.58 samples/sec Loss 6.8571 LearningRate 0.0718 Epoch: 3 Global Step: 37930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:29,773-Speed 3356.44 samples/sec Loss 6.8532 LearningRate 0.0718 Epoch: 3 Global Step: 37940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:32,790-Speed 3395.85 samples/sec Loss 6.7679 LearningRate 0.0718 Epoch: 3 Global Step: 37950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:35,855-Speed 3341.87 samples/sec Loss 6.7341 LearningRate 0.0718 Epoch: 3 Global Step: 37960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:38,903-Speed 3360.62 samples/sec Loss 6.7055 LearningRate 0.0718 Epoch: 3 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:01:41,997-Speed 3310.47 samples/sec Loss 6.7180 LearningRate 0.0718 Epoch: 3 Global Step: 37980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:45,024-Speed 3384.24 samples/sec Loss 6.5578 LearningRate 0.0718 Epoch: 3 Global Step: 37990 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:48,036-Speed 3400.89 samples/sec Loss 6.8036 LearningRate 0.0717 Epoch: 3 Global Step: 38000 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:51,062-Speed 3384.72 samples/sec Loss 6.6521 LearningRate 0.0717 Epoch: 3 Global Step: 38010 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:54,099-Speed 3373.50 samples/sec Loss 6.8511 LearningRate 0.0717 Epoch: 3 Global Step: 38020 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:01:57,114-Speed 3397.16 samples/sec Loss 6.8337 LearningRate 0.0717 Epoch: 3 Global Step: 38030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:00,117-Speed 3410.74 samples/sec Loss 6.7960 LearningRate 0.0717 Epoch: 3 Global Step: 38040 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:03,159-Speed 3368.35 samples/sec Loss 6.7420 LearningRate 0.0717 Epoch: 3 Global Step: 38050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:06,182-Speed 3387.89 samples/sec Loss 6.8332 LearningRate 0.0717 Epoch: 3 Global Step: 38060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:09,192-Speed 3403.56 samples/sec Loss 6.6281 LearningRate 0.0717 Epoch: 3 Global Step: 38070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:12,235-Speed 3365.74 samples/sec Loss 6.8279 LearningRate 0.0717 Epoch: 3 Global Step: 38080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:02:15,241-Speed 3407.39 samples/sec Loss 6.6854 LearningRate 0.0717 Epoch: 3 Global Step: 38090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:02:18,239-Speed 3417.24 samples/sec Loss 6.7432 LearningRate 0.0717 Epoch: 3 Global Step: 38100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:21,265-Speed 3384.91 samples/sec Loss 6.8264 LearningRate 0.0717 Epoch: 3 Global Step: 38110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:24,318-Speed 3355.69 samples/sec Loss 6.8080 LearningRate 0.0717 Epoch: 3 Global Step: 38120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:27,335-Speed 3395.35 samples/sec Loss 6.7411 LearningRate 0.0717 Epoch: 3 Global Step: 38130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:30,391-Speed 3352.04 samples/sec Loss 6.6986 LearningRate 0.0717 Epoch: 3 Global Step: 38140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:33,406-Speed 3397.48 samples/sec Loss 6.8863 LearningRate 0.0716 Epoch: 3 Global Step: 38150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:36,458-Speed 3355.93 samples/sec Loss 6.8804 LearningRate 0.0716 Epoch: 3 Global Step: 38160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:39,480-Speed 3389.59 samples/sec Loss 6.7491 LearningRate 0.0716 Epoch: 3 Global Step: 38170 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:42,585-Speed 3299.42 samples/sec Loss 6.7146 LearningRate 0.0716 Epoch: 3 Global Step: 38180 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:45,585-Speed 3414.44 samples/sec Loss 6.8566 LearningRate 0.0716 Epoch: 3 Global Step: 38190 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:02:48,696-Speed 3292.65 samples/sec Loss 6.8920 LearningRate 0.0716 Epoch: 3 Global Step: 38200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:02:51,800-Speed 3299.68 samples/sec Loss 7.0516 LearningRate 0.0716 Epoch: 3 Global Step: 38210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:02:54,859-Speed 3348.59 samples/sec Loss 6.9954 LearningRate 0.0716 Epoch: 3 Global Step: 38220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:02:57,858-Speed 3415.20 samples/sec Loss 6.7958 LearningRate 0.0716 Epoch: 3 Global Step: 38230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:03:00,909-Speed 3357.68 samples/sec Loss 6.8582 LearningRate 0.0716 Epoch: 3 Global Step: 38240 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:04,003-Speed 3310.34 samples/sec Loss 6.7146 LearningRate 0.0716 Epoch: 3 Global Step: 38250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:07,109-Speed 3298.74 samples/sec Loss 6.8047 LearningRate 0.0716 Epoch: 3 Global Step: 38260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:10,129-Speed 3391.86 samples/sec Loss 6.9378 LearningRate 0.0716 Epoch: 3 Global Step: 38270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:13,152-Speed 3387.77 samples/sec Loss 6.8330 LearningRate 0.0716 Epoch: 3 Global Step: 38280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:16,247-Speed 3309.99 samples/sec Loss 6.8845 LearningRate 0.0715 Epoch: 3 Global Step: 38290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:19,344-Speed 3307.44 samples/sec Loss 6.9465 LearningRate 0.0715 Epoch: 3 Global Step: 38300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:22,392-Speed 3360.70 samples/sec Loss 6.7882 LearningRate 0.0715 Epoch: 3 Global Step: 38310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:25,442-Speed 3358.90 samples/sec Loss 6.9501 LearningRate 0.0715 Epoch: 3 Global Step: 38320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:28,475-Speed 3377.14 samples/sec Loss 6.8386 LearningRate 0.0715 Epoch: 3 Global Step: 38330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:31,517-Speed 3367.19 samples/sec Loss 6.8763 LearningRate 0.0715 Epoch: 3 Global Step: 38340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:03:34,535-Speed 3393.83 samples/sec Loss 6.9865 LearningRate 0.0715 Epoch: 3 Global Step: 38350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:03:37,559-Speed 3388.03 samples/sec Loss 6.9001 LearningRate 0.0715 Epoch: 3 Global Step: 38360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:03:40,616-Speed 3350.08 samples/sec Loss 6.9600 LearningRate 0.0715 Epoch: 3 Global Step: 38370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:03:43,636-Speed 3392.07 samples/sec Loss 6.9826 LearningRate 0.0715 Epoch: 3 Global Step: 38380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:03:46,650-Speed 3397.69 samples/sec Loss 6.8845 LearningRate 0.0715 Epoch: 3 Global Step: 38390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:49,742-Speed 3312.99 samples/sec Loss 6.8322 LearningRate 0.0715 Epoch: 3 Global Step: 38400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:52,763-Speed 3391.55 samples/sec Loss 6.9328 LearningRate 0.0715 Epoch: 3 Global Step: 38410 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:55,770-Speed 3406.03 samples/sec Loss 6.8385 LearningRate 0.0715 Epoch: 3 Global Step: 38420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:03:58,806-Speed 3374.59 samples/sec Loss 6.9734 LearningRate 0.0715 Epoch: 3 Global Step: 38430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:04:01,878-Speed 3334.35 samples/sec Loss 6.9084 LearningRate 0.0714 Epoch: 3 Global Step: 38440 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:04:04,930-Speed 3356.53 samples/sec Loss 6.9613 LearningRate 0.0714 Epoch: 3 Global Step: 38450 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:04:07,935-Speed 3408.02 samples/sec Loss 6.9162 LearningRate 0.0714 Epoch: 3 Global Step: 38460 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:04:10,966-Speed 3380.31 samples/sec Loss 6.8513 LearningRate 0.0714 Epoch: 3 Global Step: 38470 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:04:14,043-Speed 3328.87 samples/sec Loss 6.9831 LearningRate 0.0714 Epoch: 3 Global Step: 38480 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:04:17,123-Speed 3325.06 samples/sec Loss 6.8802 LearningRate 0.0714 Epoch: 3 Global Step: 38490 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:04:20,119-Speed 3419.06 samples/sec Loss 7.0114 LearningRate 0.0714 Epoch: 3 Global Step: 38500 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:04:23,126-Speed 3406.72 samples/sec Loss 6.9964 LearningRate 0.0714 Epoch: 3 Global Step: 38510 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:04:26,182-Speed 3352.13 samples/sec Loss 6.9662 LearningRate 0.0714 Epoch: 3 Global Step: 38520 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:04:29,211-Speed 3381.68 samples/sec Loss 7.0059 LearningRate 0.0714 Epoch: 3 Global Step: 38530 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:04:32,249-Speed 3372.36 samples/sec Loss 6.8451 LearningRate 0.0714 Epoch: 3 Global Step: 38540 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:04:35,248-Speed 3415.45 samples/sec Loss 7.0581 LearningRate 0.0714 Epoch: 3 Global Step: 38550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:04:38,292-Speed 3364.92 samples/sec Loss 6.9745 LearningRate 0.0714 Epoch: 3 Global Step: 38560 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:04:41,337-Speed 3364.17 samples/sec Loss 6.9158 LearningRate 0.0714 Epoch: 3 Global Step: 38570 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:04:44,343-Speed 3407.62 samples/sec Loss 7.1026 LearningRate 0.0714 Epoch: 3 Global Step: 38580 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:04:47,415-Speed 3334.74 samples/sec Loss 6.9675 LearningRate 0.0713 Epoch: 3 Global Step: 38590 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:04:50,446-Speed 3378.96 samples/sec Loss 7.0229 LearningRate 0.0713 Epoch: 3 Global Step: 38600 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:04:53,444-Speed 3416.76 samples/sec Loss 6.8184 LearningRate 0.0713 Epoch: 3 Global Step: 38610 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:04:56,449-Speed 3408.94 samples/sec Loss 6.9186 LearningRate 0.0713 Epoch: 3 Global Step: 38620 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:04:59,454-Speed 3409.20 samples/sec Loss 7.0349 LearningRate 0.0713 Epoch: 3 Global Step: 38630 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:05:02,608-Speed 3247.39 samples/sec Loss 6.8983 LearningRate 0.0713 Epoch: 3 Global Step: 38640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:05:05,637-Speed 3382.14 samples/sec Loss 7.0805 LearningRate 0.0713 Epoch: 3 Global Step: 38650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:05:08,641-Speed 3409.51 samples/sec Loss 7.0000 LearningRate 0.0713 Epoch: 3 Global Step: 38660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:05:11,675-Speed 3377.06 samples/sec Loss 6.9642 LearningRate 0.0713 Epoch: 3 Global Step: 38670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:05:14,703-Speed 3382.82 samples/sec Loss 6.9170 LearningRate 0.0713 Epoch: 3 Global Step: 38680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:05:17,722-Speed 3392.66 samples/sec Loss 7.0512 LearningRate 0.0713 Epoch: 3 Global Step: 38690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:05:20,737-Speed 3396.43 samples/sec Loss 6.9801 LearningRate 0.0713 Epoch: 3 Global Step: 38700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:05:23,754-Speed 3395.81 samples/sec Loss 7.0077 LearningRate 0.0713 Epoch: 3 Global Step: 38710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:05:26,770-Speed 3396.22 samples/sec Loss 7.0937 LearningRate 0.0713 Epoch: 3 Global Step: 38720 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:05:29,768-Speed 3417.05 samples/sec Loss 6.9624 LearningRate 0.0712 Epoch: 3 Global Step: 38730 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:05:32,862-Speed 3309.62 samples/sec Loss 7.1036 LearningRate 0.0712 Epoch: 3 Global Step: 38740 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:05:35,976-Speed 3290.71 samples/sec Loss 6.8847 LearningRate 0.0712 Epoch: 3 Global Step: 38750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:05:39,000-Speed 3387.38 samples/sec Loss 6.9359 LearningRate 0.0712 Epoch: 3 Global Step: 38760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:05:42,117-Speed 3285.02 samples/sec Loss 6.9088 LearningRate 0.0712 Epoch: 3 Global Step: 38770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:05:45,197-Speed 3327.01 samples/sec Loss 7.0581 LearningRate 0.0712 Epoch: 3 Global Step: 38780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:05:48,253-Speed 3350.99 samples/sec Loss 6.9532 LearningRate 0.0712 Epoch: 3 Global Step: 38790 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:05:51,375-Speed 3280.86 samples/sec Loss 6.9803 LearningRate 0.0712 Epoch: 3 Global Step: 38800 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:05:54,450-Speed 3331.01 samples/sec Loss 7.1318 LearningRate 0.0712 Epoch: 3 Global Step: 38810 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:05:57,497-Speed 3362.68 samples/sec Loss 6.9287 LearningRate 0.0712 Epoch: 3 Global Step: 38820 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:06:00,622-Speed 3277.72 samples/sec Loss 7.0771 LearningRate 0.0712 Epoch: 3 Global Step: 38830 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:06:03,706-Speed 3320.32 samples/sec Loss 7.0831 LearningRate 0.0712 Epoch: 3 Global Step: 38840 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:06:06,717-Speed 3402.17 samples/sec Loss 7.1290 LearningRate 0.0712 Epoch: 3 Global Step: 38850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:06:09,736-Speed 3392.97 samples/sec Loss 7.0510 LearningRate 0.0712 Epoch: 3 Global Step: 38860 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:06:12,872-Speed 3266.26 samples/sec Loss 7.0872 LearningRate 0.0712 Epoch: 3 Global Step: 38870 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:06:15,936-Speed 3343.63 samples/sec Loss 7.0517 LearningRate 0.0711 Epoch: 3 Global Step: 38880 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:06:18,951-Speed 3397.28 samples/sec Loss 6.8099 LearningRate 0.0711 Epoch: 3 Global Step: 38890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:06:21,957-Speed 3407.64 samples/sec Loss 7.1268 LearningRate 0.0711 Epoch: 3 Global Step: 38900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:06:25,072-Speed 3288.06 samples/sec Loss 7.1024 LearningRate 0.0711 Epoch: 3 Global Step: 38910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:06:28,141-Speed 3338.12 samples/sec Loss 7.0633 LearningRate 0.0711 Epoch: 3 Global Step: 38920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:06:31,163-Speed 3389.93 samples/sec Loss 7.1999 LearningRate 0.0711 Epoch: 3 Global Step: 38930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:06:34,164-Speed 3413.10 samples/sec Loss 7.0107 LearningRate 0.0711 Epoch: 3 Global Step: 38940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:06:37,247-Speed 3322.24 samples/sec Loss 7.0417 LearningRate 0.0711 Epoch: 3 Global Step: 38950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:06:40,312-Speed 3341.56 samples/sec Loss 7.0254 LearningRate 0.0711 Epoch: 3 Global Step: 38960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:06:43,347-Speed 3375.50 samples/sec Loss 7.1480 LearningRate 0.0711 Epoch: 3 Global Step: 38970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:06:46,398-Speed 3357.29 samples/sec Loss 7.0167 LearningRate 0.0711 Epoch: 3 Global Step: 38980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:06:49,434-Speed 3374.05 samples/sec Loss 7.0609 LearningRate 0.0711 Epoch: 3 Global Step: 38990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:06:52,448-Speed 3398.19 samples/sec Loss 7.0680 LearningRate 0.0711 Epoch: 3 Global Step: 39000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:06:55,474-Speed 3386.08 samples/sec Loss 7.0802 LearningRate 0.0711 Epoch: 3 Global Step: 39010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:06:58,520-Speed 3362.65 samples/sec Loss 7.0458 LearningRate 0.0711 Epoch: 3 Global Step: 39020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:07:01,558-Speed 3372.06 samples/sec Loss 7.1194 LearningRate 0.0710 Epoch: 3 Global Step: 39030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:04,645-Speed 3318.19 samples/sec Loss 7.1511 LearningRate 0.0710 Epoch: 3 Global Step: 39040 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:07,733-Speed 3317.47 samples/sec Loss 7.1470 LearningRate 0.0710 Epoch: 3 Global Step: 39050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:10,781-Speed 3360.25 samples/sec Loss 7.1126 LearningRate 0.0710 Epoch: 3 Global Step: 39060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:13,823-Speed 3366.98 samples/sec Loss 7.1315 LearningRate 0.0710 Epoch: 3 Global Step: 39070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:16,846-Speed 3389.17 samples/sec Loss 7.1343 LearningRate 0.0710 Epoch: 3 Global Step: 39080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:19,883-Speed 3371.61 samples/sec Loss 7.1201 LearningRate 0.0710 Epoch: 3 Global Step: 39090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:22,919-Speed 3374.85 samples/sec Loss 7.0655 LearningRate 0.0710 Epoch: 3 Global Step: 39100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:25,958-Speed 3370.32 samples/sec Loss 7.0654 LearningRate 0.0710 Epoch: 3 Global Step: 39110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:29,031-Speed 3332.96 samples/sec Loss 7.1535 LearningRate 0.0710 Epoch: 3 Global Step: 39120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:32,074-Speed 3366.17 samples/sec Loss 7.2027 LearningRate 0.0710 Epoch: 3 Global Step: 39130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:07:35,146-Speed 3335.32 samples/sec Loss 7.2021 LearningRate 0.0710 Epoch: 3 Global Step: 39140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:07:38,217-Speed 3334.88 samples/sec Loss 7.1434 LearningRate 0.0710 Epoch: 3 Global Step: 39150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:07:41,236-Speed 3393.29 samples/sec Loss 7.1450 LearningRate 0.0710 Epoch: 3 Global Step: 39160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:07:44,269-Speed 3377.35 samples/sec Loss 7.0762 LearningRate 0.0710 Epoch: 3 Global Step: 39170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:07:47,303-Speed 3376.72 samples/sec Loss 7.2894 LearningRate 0.0709 Epoch: 3 Global Step: 39180 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:50,322-Speed 3392.94 samples/sec Loss 7.0737 LearningRate 0.0709 Epoch: 3 Global Step: 39190 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:07:53,438-Speed 3287.34 samples/sec Loss 7.2241 LearningRate 0.0709 Epoch: 3 Global Step: 39200 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:07:56,497-Speed 3348.05 samples/sec Loss 7.3175 LearningRate 0.0709 Epoch: 3 Global Step: 39210 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:07:59,520-Speed 3388.75 samples/sec Loss 7.0297 LearningRate 0.0709 Epoch: 3 Global Step: 39220 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:08:02,566-Speed 3362.92 samples/sec Loss 7.1280 LearningRate 0.0709 Epoch: 3 Global Step: 39230 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:08:05,600-Speed 3375.83 samples/sec Loss 7.1316 LearningRate 0.0709 Epoch: 3 Global Step: 39240 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:08:08,613-Speed 3399.52 samples/sec Loss 7.1666 LearningRate 0.0709 Epoch: 3 Global Step: 39250 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:08:11,713-Speed 3304.66 samples/sec Loss 7.0208 LearningRate 0.0709 Epoch: 3 Global Step: 39260 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:08:14,760-Speed 3362.65 samples/sec Loss 7.1294 LearningRate 0.0709 Epoch: 3 Global Step: 39270 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:08:17,801-Speed 3367.69 samples/sec Loss 7.2245 LearningRate 0.0709 Epoch: 3 Global Step: 39280 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:08:20,813-Speed 3400.66 samples/sec Loss 7.1484 LearningRate 0.0709 Epoch: 3 Global Step: 39290 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:08:23,838-Speed 3386.13 samples/sec Loss 7.2545 LearningRate 0.0709 Epoch: 3 Global Step: 39300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:08:26,915-Speed 3329.94 samples/sec Loss 7.2535 LearningRate 0.0709 Epoch: 3 Global Step: 39310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:08:29,943-Speed 3382.29 samples/sec Loss 7.1803 LearningRate 0.0708 Epoch: 3 Global Step: 39320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:08:32,992-Speed 3360.63 samples/sec Loss 7.1677 LearningRate 0.0708 Epoch: 3 Global Step: 39330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:08:36,028-Speed 3373.83 samples/sec Loss 7.1557 LearningRate 0.0708 Epoch: 3 Global Step: 39340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:08:39,063-Speed 3374.74 samples/sec Loss 7.1445 LearningRate 0.0708 Epoch: 3 Global Step: 39350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:08:42,088-Speed 3385.51 samples/sec Loss 7.0983 LearningRate 0.0708 Epoch: 3 Global Step: 39360 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:08:45,111-Speed 3389.42 samples/sec Loss 7.0915 LearningRate 0.0708 Epoch: 3 Global Step: 39370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:08:48,134-Speed 3388.68 samples/sec Loss 7.1408 LearningRate 0.0708 Epoch: 3 Global Step: 39380 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:08:51,177-Speed 3365.49 samples/sec Loss 7.0480 LearningRate 0.0708 Epoch: 3 Global Step: 39390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:08:54,284-Speed 3297.29 samples/sec Loss 7.1117 LearningRate 0.0708 Epoch: 3 Global Step: 39400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:08:57,292-Speed 3405.00 samples/sec Loss 7.2726 LearningRate 0.0708 Epoch: 3 Global Step: 39410 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:00,369-Speed 3328.98 samples/sec Loss 7.2726 LearningRate 0.0708 Epoch: 3 Global Step: 39420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:03,424-Speed 3352.73 samples/sec Loss 7.2235 LearningRate 0.0708 Epoch: 3 Global Step: 39430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:06,452-Speed 3383.20 samples/sec Loss 7.3084 LearningRate 0.0708 Epoch: 3 Global Step: 39440 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:09,494-Speed 3366.71 samples/sec Loss 7.1958 LearningRate 0.0708 Epoch: 3 Global Step: 39450 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:12,550-Speed 3351.61 samples/sec Loss 7.2227 LearningRate 0.0708 Epoch: 3 Global Step: 39460 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:15,622-Speed 3334.74 samples/sec Loss 7.2426 LearningRate 0.0707 Epoch: 3 Global Step: 39470 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:18,658-Speed 3374.58 samples/sec Loss 7.2172 LearningRate 0.0707 Epoch: 3 Global Step: 39480 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:21,688-Speed 3380.43 samples/sec Loss 7.2150 LearningRate 0.0707 Epoch: 3 Global Step: 39490 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:24,804-Speed 3287.06 samples/sec Loss 7.2724 LearningRate 0.0707 Epoch: 3 Global Step: 39500 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:27,893-Speed 3316.17 samples/sec Loss 7.1994 LearningRate 0.0707 Epoch: 3 Global Step: 39510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:09:30,970-Speed 3328.20 samples/sec Loss 7.2005 LearningRate 0.0707 Epoch: 3 Global Step: 39520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:09:33,991-Speed 3391.05 samples/sec Loss 7.2639 LearningRate 0.0707 Epoch: 3 Global Step: 39530 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:37,065-Speed 3332.13 samples/sec Loss 7.2953 LearningRate 0.0707 Epoch: 3 Global Step: 39540 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:40,110-Speed 3364.54 samples/sec Loss 7.1127 LearningRate 0.0707 Epoch: 3 Global Step: 39550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:43,188-Speed 3327.96 samples/sec Loss 7.1734 LearningRate 0.0707 Epoch: 3 Global Step: 39560 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:46,218-Speed 3380.06 samples/sec Loss 7.2029 LearningRate 0.0707 Epoch: 3 Global Step: 39570 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:49,297-Speed 3327.36 samples/sec Loss 7.3378 LearningRate 0.0707 Epoch: 3 Global Step: 39580 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:52,366-Speed 3337.61 samples/sec Loss 7.1864 LearningRate 0.0707 Epoch: 3 Global Step: 39590 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:55,407-Speed 3368.64 samples/sec Loss 7.1359 LearningRate 0.0707 Epoch: 3 Global Step: 39600 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:09:58,448-Speed 3368.08 samples/sec Loss 7.0892 LearningRate 0.0707 Epoch: 3 Global Step: 39610 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:01,525-Speed 3329.15 samples/sec Loss 7.1243 LearningRate 0.0706 Epoch: 3 Global Step: 39620 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:04,639-Speed 3289.21 samples/sec Loss 7.3766 LearningRate 0.0706 Epoch: 3 Global Step: 39630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:10:07,685-Speed 3363.59 samples/sec Loss 7.2885 LearningRate 0.0706 Epoch: 3 Global Step: 39640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:10:10,761-Speed 3329.66 samples/sec Loss 7.2023 LearningRate 0.0706 Epoch: 3 Global Step: 39650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:10:13,822-Speed 3347.09 samples/sec Loss 7.1154 LearningRate 0.0706 Epoch: 3 Global Step: 39660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:16,883-Speed 3346.62 samples/sec Loss 7.2482 LearningRate 0.0706 Epoch: 3 Global Step: 39670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:19,916-Speed 3376.55 samples/sec Loss 7.0908 LearningRate 0.0706 Epoch: 3 Global Step: 39680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:22,951-Speed 3375.96 samples/sec Loss 7.2986 LearningRate 0.0706 Epoch: 3 Global Step: 39690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:26,030-Speed 3326.32 samples/sec Loss 7.2866 LearningRate 0.0706 Epoch: 3 Global Step: 39700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:29,094-Speed 3342.92 samples/sec Loss 7.0712 LearningRate 0.0706 Epoch: 3 Global Step: 39710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:32,157-Speed 3345.02 samples/sec Loss 7.1516 LearningRate 0.0706 Epoch: 3 Global Step: 39720 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:35,195-Speed 3371.86 samples/sec Loss 7.2221 LearningRate 0.0706 Epoch: 3 Global Step: 39730 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:38,223-Speed 3382.38 samples/sec Loss 7.1954 LearningRate 0.0706 Epoch: 3 Global Step: 39740 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:41,281-Speed 3350.05 samples/sec Loss 7.0743 LearningRate 0.0706 Epoch: 3 Global Step: 39750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:44,300-Speed 3392.67 samples/sec Loss 7.2318 LearningRate 0.0706 Epoch: 3 Global Step: 39760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:10:47,322-Speed 3390.01 samples/sec Loss 7.1869 LearningRate 0.0705 Epoch: 3 Global Step: 39770 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:50,362-Speed 3369.97 samples/sec Loss 7.1951 LearningRate 0.0705 Epoch: 3 Global Step: 39780 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:53,443-Speed 3324.08 samples/sec Loss 7.2607 LearningRate 0.0705 Epoch: 3 Global Step: 39790 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:56,494-Speed 3358.37 samples/sec Loss 7.2727 LearningRate 0.0705 Epoch: 3 Global Step: 39800 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:10:59,542-Speed 3360.26 samples/sec Loss 7.2118 LearningRate 0.0705 Epoch: 3 Global Step: 39810 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:11:02,613-Speed 3335.22 samples/sec Loss 7.3320 LearningRate 0.0705 Epoch: 3 Global Step: 39820 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:11:05,681-Speed 3338.79 samples/sec Loss 7.2689 LearningRate 0.0705 Epoch: 3 Global Step: 39830 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:11:08,686-Speed 3409.22 samples/sec Loss 7.2350 LearningRate 0.0705 Epoch: 3 Global Step: 39840 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:11:11,748-Speed 3345.06 samples/sec Loss 7.2613 LearningRate 0.0705 Epoch: 3 Global Step: 39850 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:11:14,828-Speed 3326.09 samples/sec Loss 7.3135 LearningRate 0.0705 Epoch: 3 Global Step: 39860 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:11:17,904-Speed 3329.48 samples/sec Loss 7.2880 LearningRate 0.0705 Epoch: 3 Global Step: 39870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:11:20,963-Speed 3348.90 samples/sec Loss 7.2449 LearningRate 0.0705 Epoch: 3 Global Step: 39880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:11:24,007-Speed 3364.99 samples/sec Loss 7.2691 LearningRate 0.0705 Epoch: 3 Global Step: 39890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:11:27,043-Speed 3373.84 samples/sec Loss 7.2083 LearningRate 0.0705 Epoch: 3 Global Step: 39900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:11:30,093-Speed 3358.52 samples/sec Loss 7.2380 LearningRate 0.0704 Epoch: 3 Global Step: 39910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:11:33,113-Speed 3392.56 samples/sec Loss 7.2357 LearningRate 0.0704 Epoch: 3 Global Step: 39920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:11:36,157-Speed 3365.53 samples/sec Loss 7.2618 LearningRate 0.0704 Epoch: 3 Global Step: 39930 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:11:39,230-Speed 3332.25 samples/sec Loss 7.2662 LearningRate 0.0704 Epoch: 3 Global Step: 39940 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:11:42,275-Speed 3364.25 samples/sec Loss 7.3239 LearningRate 0.0704 Epoch: 3 Global Step: 39950 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:11:45,307-Speed 3379.25 samples/sec Loss 7.3634 LearningRate 0.0704 Epoch: 3 Global Step: 39960 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:11:48,425-Speed 3285.13 samples/sec Loss 7.3175 LearningRate 0.0704 Epoch: 3 Global Step: 39970 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:11:51,496-Speed 3335.56 samples/sec Loss 7.3468 LearningRate 0.0704 Epoch: 3 Global Step: 39980 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:11:54,559-Speed 3343.71 samples/sec Loss 7.3210 LearningRate 0.0704 Epoch: 3 Global Step: 39990 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:11:57,616-Speed 3351.26 samples/sec Loss 7.3275 LearningRate 0.0704 Epoch: 3 Global Step: 40000 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:00,718-Speed 3301.37 samples/sec Loss 7.3448 LearningRate 0.0704 Epoch: 3 Global Step: 40010 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:03,799-Speed 3324.75 samples/sec Loss 7.2822 LearningRate 0.0704 Epoch: 3 Global Step: 40020 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:06,822-Speed 3389.45 samples/sec Loss 7.3913 LearningRate 0.0704 Epoch: 3 Global Step: 40030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:12:09,828-Speed 3406.51 samples/sec Loss 7.3396 LearningRate 0.0704 Epoch: 3 Global Step: 40040 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:12:12,866-Speed 3372.12 samples/sec Loss 7.4412 LearningRate 0.0704 Epoch: 3 Global Step: 40050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:12:15,910-Speed 3365.40 samples/sec Loss 7.2908 LearningRate 0.0703 Epoch: 3 Global Step: 40060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:12:19,002-Speed 3312.12 samples/sec Loss 7.3386 LearningRate 0.0703 Epoch: 3 Global Step: 40070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:12:22,035-Speed 3377.60 samples/sec Loss 7.3073 LearningRate 0.0703 Epoch: 3 Global Step: 40080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:12:25,056-Speed 3391.50 samples/sec Loss 7.3041 LearningRate 0.0703 Epoch: 3 Global Step: 40090 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:28,157-Speed 3302.38 samples/sec Loss 7.2208 LearningRate 0.0703 Epoch: 3 Global Step: 40100 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:31,206-Speed 3359.66 samples/sec Loss 7.3804 LearningRate 0.0703 Epoch: 3 Global Step: 40110 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:34,211-Speed 3408.55 samples/sec Loss 7.2549 LearningRate 0.0703 Epoch: 3 Global Step: 40120 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:37,252-Speed 3368.08 samples/sec Loss 7.2604 LearningRate 0.0703 Epoch: 3 Global Step: 40130 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:40,279-Speed 3384.57 samples/sec Loss 7.3978 LearningRate 0.0703 Epoch: 3 Global Step: 40140 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:43,300-Speed 3390.95 samples/sec Loss 7.1860 LearningRate 0.0703 Epoch: 3 Global Step: 40150 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:46,324-Speed 3387.07 samples/sec Loss 7.2139 LearningRate 0.0703 Epoch: 3 Global Step: 40160 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:49,324-Speed 3414.44 samples/sec Loss 7.3920 LearningRate 0.0703 Epoch: 3 Global Step: 40170 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:52,424-Speed 3303.62 samples/sec Loss 7.3568 LearningRate 0.0703 Epoch: 3 Global Step: 40180 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:12:55,461-Speed 3372.88 samples/sec Loss 7.2799 LearningRate 0.0703 Epoch: 3 Global Step: 40190 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:12:58,472-Speed 3402.59 samples/sec Loss 7.3874 LearningRate 0.0703 Epoch: 3 Global Step: 40200 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:13:01,499-Speed 3384.12 samples/sec Loss 7.4245 LearningRate 0.0702 Epoch: 3 Global Step: 40210 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:13:04,601-Speed 3301.66 samples/sec Loss 7.3009 LearningRate 0.0702 Epoch: 3 Global Step: 40220 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:13:07,679-Speed 3328.67 samples/sec Loss 7.3883 LearningRate 0.0702 Epoch: 3 Global Step: 40230 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:13:10,665-Speed 3430.22 samples/sec Loss 7.3112 LearningRate 0.0702 Epoch: 3 Global Step: 40240 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:13,658-Speed 3421.80 samples/sec Loss 7.2554 LearningRate 0.0702 Epoch: 3 Global Step: 40250 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:16,688-Speed 3381.43 samples/sec Loss 7.3997 LearningRate 0.0702 Epoch: 3 Global Step: 40260 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:19,756-Speed 3338.28 samples/sec Loss 7.2916 LearningRate 0.0702 Epoch: 3 Global Step: 40270 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:22,768-Speed 3401.46 samples/sec Loss 7.3065 LearningRate 0.0702 Epoch: 3 Global Step: 40280 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:25,847-Speed 3326.37 samples/sec Loss 7.4662 LearningRate 0.0702 Epoch: 3 Global Step: 40290 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:28,911-Speed 3343.29 samples/sec Loss 7.3237 LearningRate 0.0702 Epoch: 3 Global Step: 40300 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:31,929-Speed 3394.21 samples/sec Loss 7.3286 LearningRate 0.0702 Epoch: 3 Global Step: 40310 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:34,989-Speed 3346.90 samples/sec Loss 7.2630 LearningRate 0.0702 Epoch: 3 Global Step: 40320 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:38,108-Speed 3283.94 samples/sec Loss 7.2817 LearningRate 0.0702 Epoch: 3 Global Step: 40330 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:41,164-Speed 3352.10 samples/sec Loss 7.4884 LearningRate 0.0702 Epoch: 3 Global Step: 40340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:13:44,165-Speed 3413.27 samples/sec Loss 7.3889 LearningRate 0.0702 Epoch: 3 Global Step: 40350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:13:47,259-Speed 3311.46 samples/sec Loss 7.3800 LearningRate 0.0701 Epoch: 3 Global Step: 40360 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:13:50,371-Speed 3291.00 samples/sec Loss 7.3577 LearningRate 0.0701 Epoch: 3 Global Step: 40370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:13:53,375-Speed 3410.37 samples/sec Loss 7.3098 LearningRate 0.0701 Epoch: 3 Global Step: 40380 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:56,405-Speed 3380.24 samples/sec Loss 7.4271 LearningRate 0.0701 Epoch: 3 Global Step: 40390 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:13:59,432-Speed 3383.81 samples/sec Loss 7.3012 LearningRate 0.0701 Epoch: 3 Global Step: 40400 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:14:02,467-Speed 3375.86 samples/sec Loss 7.3924 LearningRate 0.0701 Epoch: 3 Global Step: 40410 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:14:05,542-Speed 3330.25 samples/sec Loss 7.5183 LearningRate 0.0701 Epoch: 3 Global Step: 40420 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:14:08,574-Speed 3378.57 samples/sec Loss 7.3273 LearningRate 0.0701 Epoch: 3 Global Step: 40430 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:14:11,617-Speed 3366.70 samples/sec Loss 7.3285 LearningRate 0.0701 Epoch: 3 Global Step: 40440 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:14:14,659-Speed 3367.55 samples/sec Loss 7.3385 LearningRate 0.0701 Epoch: 3 Global Step: 40450 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:14:17,671-Speed 3400.34 samples/sec Loss 7.3563 LearningRate 0.0701 Epoch: 3 Global Step: 40460 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:14:20,691-Speed 3392.39 samples/sec Loss 7.3344 LearningRate 0.0701 Epoch: 3 Global Step: 40470 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:14:23,740-Speed 3359.34 samples/sec Loss 7.3990 LearningRate 0.0701 Epoch: 3 Global Step: 40480 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:14:26,763-Speed 3388.29 samples/sec Loss 7.3636 LearningRate 0.0701 Epoch: 3 Global Step: 40490 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:14:29,820-Speed 3350.86 samples/sec Loss 7.1819 LearningRate 0.0701 Epoch: 3 Global Step: 40500 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:14:32,821-Speed 3413.34 samples/sec Loss 7.4274 LearningRate 0.0700 Epoch: 3 Global Step: 40510 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:14:35,843-Speed 3390.19 samples/sec Loss 7.4308 LearningRate 0.0700 Epoch: 3 Global Step: 40520 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:14:38,850-Speed 3406.60 samples/sec Loss 7.3274 LearningRate 0.0700 Epoch: 3 Global Step: 40530 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:14:41,873-Speed 3388.25 samples/sec Loss 7.4597 LearningRate 0.0700 Epoch: 3 Global Step: 40540 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:14:44,887-Speed 3398.39 samples/sec Loss 7.3244 LearningRate 0.0700 Epoch: 3 Global Step: 40550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:14:47,914-Speed 3384.15 samples/sec Loss 7.3727 LearningRate 0.0700 Epoch: 3 Global Step: 40560 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:14:51,007-Speed 3310.95 samples/sec Loss 7.3667 LearningRate 0.0700 Epoch: 3 Global Step: 40570 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:14:54,177-Speed 3231.32 samples/sec Loss 7.3266 LearningRate 0.0700 Epoch: 3 Global Step: 40580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:14:57,189-Speed 3401.09 samples/sec Loss 7.2815 LearningRate 0.0700 Epoch: 3 Global Step: 40590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:15:00,220-Speed 3380.29 samples/sec Loss 7.4314 LearningRate 0.0700 Epoch: 3 Global Step: 40600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:15:03,258-Speed 3371.79 samples/sec Loss 7.4728 LearningRate 0.0700 Epoch: 3 Global Step: 40610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:15:06,253-Speed 3419.37 samples/sec Loss 7.5442 LearningRate 0.0700 Epoch: 3 Global Step: 40620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:15:09,282-Speed 3382.11 samples/sec Loss 7.3095 LearningRate 0.0700 Epoch: 3 Global Step: 40630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:15:12,314-Speed 3392.25 samples/sec Loss 7.3903 LearningRate 0.0700 Epoch: 3 Global Step: 40640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:15:15,351-Speed 3373.54 samples/sec Loss 7.4116 LearningRate 0.0700 Epoch: 3 Global Step: 40650 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:15:18,362-Speed 3400.99 samples/sec Loss 7.4677 LearningRate 0.0699 Epoch: 3 Global Step: 40660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:15:21,372-Speed 3403.81 samples/sec Loss 7.3188 LearningRate 0.0699 Epoch: 3 Global Step: 40670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:15:24,391-Speed 3393.23 samples/sec Loss 7.3176 LearningRate 0.0699 Epoch: 3 Global Step: 40680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:15:27,523-Speed 3269.57 samples/sec Loss 7.3798 LearningRate 0.0699 Epoch: 3 Global Step: 40690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:15:30,581-Speed 3350.38 samples/sec Loss 7.4430 LearningRate 0.0699 Epoch: 3 Global Step: 40700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:15:33,604-Speed 3387.95 samples/sec Loss 7.4569 LearningRate 0.0699 Epoch: 3 Global Step: 40710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:15:36,675-Speed 3335.04 samples/sec Loss 7.3052 LearningRate 0.0699 Epoch: 3 Global Step: 40720 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:15:39,717-Speed 3367.90 samples/sec Loss 7.3361 LearningRate 0.0699 Epoch: 3 Global Step: 40730 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:15:42,800-Speed 3322.47 samples/sec Loss 7.3021 LearningRate 0.0699 Epoch: 3 Global Step: 40740 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:15:45,811-Speed 3402.61 samples/sec Loss 7.3124 LearningRate 0.0699 Epoch: 3 Global Step: 40750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:15:48,834-Speed 3387.82 samples/sec Loss 7.4247 LearningRate 0.0699 Epoch: 3 Global Step: 40760 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:15:51,863-Speed 3381.40 samples/sec Loss 7.3774 LearningRate 0.0699 Epoch: 3 Global Step: 40770 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:15:54,965-Speed 3302.92 samples/sec Loss 7.3460 LearningRate 0.0699 Epoch: 3 Global Step: 40780 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:15:57,973-Speed 3405.98 samples/sec Loss 7.3481 LearningRate 0.0699 Epoch: 3 Global Step: 40790 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:16:01,075-Speed 3301.35 samples/sec Loss 7.3711 LearningRate 0.0698 Epoch: 3 Global Step: 40800 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:16:04,118-Speed 3365.90 samples/sec Loss 7.3829 LearningRate 0.0698 Epoch: 3 Global Step: 40810 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:16:07,151-Speed 3378.18 samples/sec Loss 7.4504 LearningRate 0.0698 Epoch: 3 Global Step: 40820 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:16:10,150-Speed 3415.46 samples/sec Loss 7.3051 LearningRate 0.0698 Epoch: 3 Global Step: 40830 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:16:13,251-Speed 3302.92 samples/sec Loss 7.3693 LearningRate 0.0698 Epoch: 3 Global Step: 40840 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:16:16,285-Speed 3375.56 samples/sec Loss 7.3997 LearningRate 0.0698 Epoch: 3 Global Step: 40850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:16:19,332-Speed 3362.11 samples/sec Loss 7.3659 LearningRate 0.0698 Epoch: 3 Global Step: 40860 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:22,363-Speed 3379.39 samples/sec Loss 7.5331 LearningRate 0.0698 Epoch: 3 Global Step: 40870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:25,425-Speed 3345.69 samples/sec Loss 7.3871 LearningRate 0.0698 Epoch: 3 Global Step: 40880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:28,480-Speed 3352.97 samples/sec Loss 7.3428 LearningRate 0.0698 Epoch: 3 Global Step: 40890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:31,509-Speed 3381.04 samples/sec Loss 7.3952 LearningRate 0.0698 Epoch: 3 Global Step: 40900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:34,530-Speed 3390.88 samples/sec Loss 7.4604 LearningRate 0.0698 Epoch: 3 Global Step: 40910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:37,559-Speed 3381.12 samples/sec Loss 7.4344 LearningRate 0.0698 Epoch: 3 Global Step: 40920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:40,606-Speed 3362.66 samples/sec Loss 7.4208 LearningRate 0.0698 Epoch: 3 Global Step: 40930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:43,623-Speed 3395.13 samples/sec Loss 7.4489 LearningRate 0.0698 Epoch: 3 Global Step: 40940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:46,683-Speed 3346.73 samples/sec Loss 7.2648 LearningRate 0.0697 Epoch: 3 Global Step: 40950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:49,705-Speed 3389.50 samples/sec Loss 7.4111 LearningRate 0.0697 Epoch: 3 Global Step: 40960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:16:52,801-Speed 3309.08 samples/sec Loss 7.4302 LearningRate 0.0697 Epoch: 3 Global Step: 40970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:55,846-Speed 3363.97 samples/sec Loss 7.3331 LearningRate 0.0697 Epoch: 3 Global Step: 40980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:16:58,862-Speed 3395.75 samples/sec Loss 7.4130 LearningRate 0.0697 Epoch: 3 Global Step: 40990 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:17:01,918-Speed 3352.04 samples/sec Loss 7.4413 LearningRate 0.0697 Epoch: 3 Global Step: 41000 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:17:04,926-Speed 3405.39 samples/sec Loss 7.4067 LearningRate 0.0697 Epoch: 3 Global Step: 41010 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:17:07,945-Speed 3392.98 samples/sec Loss 7.3236 LearningRate 0.0697 Epoch: 3 Global Step: 41020 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:17:10,990-Speed 3364.26 samples/sec Loss 7.4325 LearningRate 0.0697 Epoch: 3 Global Step: 41030 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:17:14,062-Speed 3334.47 samples/sec Loss 7.3221 LearningRate 0.0697 Epoch: 3 Global Step: 41040 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:17:17,068-Speed 3407.52 samples/sec Loss 7.4994 LearningRate 0.0697 Epoch: 3 Global Step: 41050 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:17:20,103-Speed 3375.58 samples/sec Loss 7.4331 LearningRate 0.0697 Epoch: 3 Global Step: 41060 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:17:23,130-Speed 3383.18 samples/sec Loss 7.4829 LearningRate 0.0697 Epoch: 3 Global Step: 41070 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:17:26,188-Speed 3350.65 samples/sec Loss 7.4745 LearningRate 0.0697 Epoch: 3 Global Step: 41080 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:17:29,219-Speed 3379.54 samples/sec Loss 7.4439 LearningRate 0.0697 Epoch: 3 Global Step: 41090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:17:32,232-Speed 3399.67 samples/sec Loss 7.4367 LearningRate 0.0696 Epoch: 3 Global Step: 41100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:17:35,225-Speed 3421.78 samples/sec Loss 7.5124 LearningRate 0.0696 Epoch: 3 Global Step: 41110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:17:38,243-Speed 3394.11 samples/sec Loss 7.4295 LearningRate 0.0696 Epoch: 3 Global Step: 41120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:17:41,314-Speed 3335.41 samples/sec Loss 7.5225 LearningRate 0.0696 Epoch: 3 Global Step: 41130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:17:44,367-Speed 3355.06 samples/sec Loss 7.4652 LearningRate 0.0696 Epoch: 3 Global Step: 41140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:17:47,400-Speed 3377.29 samples/sec Loss 7.3932 LearningRate 0.0696 Epoch: 3 Global Step: 41150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:17:50,427-Speed 3383.52 samples/sec Loss 7.4099 LearningRate 0.0696 Epoch: 3 Global Step: 41160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:17:53,530-Speed 3302.02 samples/sec Loss 7.4315 LearningRate 0.0696 Epoch: 3 Global Step: 41170 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:17:56,580-Speed 3358.50 samples/sec Loss 7.4492 LearningRate 0.0696 Epoch: 3 Global Step: 41180 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:17:59,638-Speed 3349.51 samples/sec Loss 7.4799 LearningRate 0.0696 Epoch: 3 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:18:02,663-Speed 3385.80 samples/sec Loss 7.4391 LearningRate 0.0696 Epoch: 3 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:18:05,661-Speed 3417.95 samples/sec Loss 7.3309 LearningRate 0.0696 Epoch: 3 Global Step: 41210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:18:08,660-Speed 3415.19 samples/sec Loss 7.4694 LearningRate 0.0696 Epoch: 3 Global Step: 41220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:18:11,718-Speed 3349.64 samples/sec Loss 7.4011 LearningRate 0.0696 Epoch: 3 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:18:14,840-Speed 3281.05 samples/sec Loss 7.4208 LearningRate 0.0696 Epoch: 3 Global Step: 41240 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:18:17,925-Speed 3320.24 samples/sec Loss 7.5390 LearningRate 0.0695 Epoch: 3 Global Step: 41250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:18:20,943-Speed 3392.86 samples/sec Loss 7.4781 LearningRate 0.0695 Epoch: 3 Global Step: 41260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:18:23,987-Speed 3365.66 samples/sec Loss 7.5924 LearningRate 0.0695 Epoch: 3 Global Step: 41270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:18:27,057-Speed 3336.19 samples/sec Loss 7.4893 LearningRate 0.0695 Epoch: 3 Global Step: 41280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:18:30,109-Speed 3356.16 samples/sec Loss 7.3124 LearningRate 0.0695 Epoch: 3 Global Step: 41290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:18:33,142-Speed 3377.28 samples/sec Loss 7.4304 LearningRate 0.0695 Epoch: 3 Global Step: 41300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:18:36,176-Speed 3376.24 samples/sec Loss 7.2853 LearningRate 0.0695 Epoch: 3 Global Step: 41310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:18:39,215-Speed 3370.46 samples/sec Loss 7.3445 LearningRate 0.0695 Epoch: 3 Global Step: 41320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:18:42,277-Speed 3345.93 samples/sec Loss 7.4497 LearningRate 0.0695 Epoch: 3 Global Step: 41330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:18:45,300-Speed 3388.72 samples/sec Loss 7.3265 LearningRate 0.0695 Epoch: 3 Global Step: 41340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:18:48,367-Speed 3340.08 samples/sec Loss 7.4488 LearningRate 0.0695 Epoch: 3 Global Step: 41350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:18:51,436-Speed 3337.97 samples/sec Loss 7.4645 LearningRate 0.0695 Epoch: 3 Global Step: 41360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:18:54,458-Speed 3388.62 samples/sec Loss 7.4714 LearningRate 0.0695 Epoch: 3 Global Step: 41370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:18:57,466-Speed 3406.08 samples/sec Loss 7.5000 LearningRate 0.0695 Epoch: 3 Global Step: 41380 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:00,570-Speed 3299.60 samples/sec Loss 7.4507 LearningRate 0.0695 Epoch: 3 Global Step: 41390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:03,665-Speed 3310.43 samples/sec Loss 7.5480 LearningRate 0.0694 Epoch: 3 Global Step: 41400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:06,719-Speed 3353.45 samples/sec Loss 7.3450 LearningRate 0.0694 Epoch: 3 Global Step: 41410 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:09,746-Speed 3384.95 samples/sec Loss 7.4509 LearningRate 0.0694 Epoch: 3 Global Step: 41420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:12,797-Speed 3356.69 samples/sec Loss 7.4823 LearningRate 0.0694 Epoch: 3 Global Step: 41430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:15,955-Speed 3244.20 samples/sec Loss 7.4337 LearningRate 0.0694 Epoch: 3 Global Step: 41440 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:19,029-Speed 3331.54 samples/sec Loss 7.5148 LearningRate 0.0694 Epoch: 3 Global Step: 41450 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:22,028-Speed 3415.38 samples/sec Loss 7.3839 LearningRate 0.0694 Epoch: 3 Global Step: 41460 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:25,059-Speed 3380.45 samples/sec Loss 7.4199 LearningRate 0.0694 Epoch: 3 Global Step: 41470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:19:28,145-Speed 3319.39 samples/sec Loss 7.5400 LearningRate 0.0694 Epoch: 3 Global Step: 41480 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:31,209-Speed 3343.31 samples/sec Loss 7.4599 LearningRate 0.0694 Epoch: 3 Global Step: 41490 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:34,236-Speed 3383.07 samples/sec Loss 7.4237 LearningRate 0.0694 Epoch: 3 Global Step: 41500 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:37,299-Speed 3343.90 samples/sec Loss 7.4308 LearningRate 0.0694 Epoch: 3 Global Step: 41510 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:40,381-Speed 3324.56 samples/sec Loss 7.4493 LearningRate 0.0694 Epoch: 3 Global Step: 41520 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:43,449-Speed 3337.65 samples/sec Loss 7.5362 LearningRate 0.0694 Epoch: 3 Global Step: 41530 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:46,488-Speed 3371.76 samples/sec Loss 7.4398 LearningRate 0.0694 Epoch: 3 Global Step: 41540 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:49,536-Speed 3360.40 samples/sec Loss 7.5161 LearningRate 0.0693 Epoch: 3 Global Step: 41550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:52,572-Speed 3373.50 samples/sec Loss 7.3908 LearningRate 0.0693 Epoch: 3 Global Step: 41560 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:55,595-Speed 3388.57 samples/sec Loss 7.5492 LearningRate 0.0693 Epoch: 3 Global Step: 41570 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:19:58,634-Speed 3370.76 samples/sec Loss 7.4995 LearningRate 0.0693 Epoch: 3 Global Step: 41580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:20:01,677-Speed 3366.00 samples/sec Loss 7.4229 LearningRate 0.0693 Epoch: 3 Global Step: 41590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:20:04,748-Speed 3334.63 samples/sec Loss 7.3723 LearningRate 0.0693 Epoch: 3 Global Step: 41600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:20:07,766-Speed 3394.46 samples/sec Loss 7.4631 LearningRate 0.0693 Epoch: 3 Global Step: 41610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:20:10,803-Speed 3372.98 samples/sec Loss 7.4067 LearningRate 0.0693 Epoch: 3 Global Step: 41620 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:20:13,857-Speed 3353.70 samples/sec Loss 7.4946 LearningRate 0.0693 Epoch: 3 Global Step: 41630 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:20:16,915-Speed 3349.56 samples/sec Loss 7.3831 LearningRate 0.0693 Epoch: 3 Global Step: 41640 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:20:19,961-Speed 3362.99 samples/sec Loss 7.6026 LearningRate 0.0693 Epoch: 3 Global Step: 41650 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:20:22,996-Speed 3375.76 samples/sec Loss 7.5414 LearningRate 0.0693 Epoch: 3 Global Step: 41660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:20:26,089-Speed 3310.65 samples/sec Loss 7.3865 LearningRate 0.0693 Epoch: 3 Global Step: 41670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:20:29,164-Speed 3332.23 samples/sec Loss 7.5109 LearningRate 0.0693 Epoch: 3 Global Step: 41680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:20:32,187-Speed 3388.23 samples/sec Loss 7.5252 LearningRate 0.0693 Epoch: 3 Global Step: 41690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:20:35,241-Speed 3354.25 samples/sec Loss 7.5259 LearningRate 0.0692 Epoch: 3 Global Step: 41700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:20:38,248-Speed 3405.75 samples/sec Loss 7.5259 LearningRate 0.0692 Epoch: 3 Global Step: 41710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:20:41,287-Speed 3370.85 samples/sec Loss 7.4891 LearningRate 0.0692 Epoch: 3 Global Step: 41720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:20:44,350-Speed 3344.45 samples/sec Loss 7.3530 LearningRate 0.0692 Epoch: 3 Global Step: 41730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:20:47,404-Speed 3353.68 samples/sec Loss 7.5187 LearningRate 0.0692 Epoch: 3 Global Step: 41740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:20:50,460-Speed 3352.54 samples/sec Loss 7.4653 LearningRate 0.0692 Epoch: 3 Global Step: 41750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:20:53,496-Speed 3373.30 samples/sec Loss 7.3842 LearningRate 0.0692 Epoch: 3 Global Step: 41760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:20:56,536-Speed 3369.95 samples/sec Loss 7.5583 LearningRate 0.0692 Epoch: 3 Global Step: 41770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:20:59,572-Speed 3373.19 samples/sec Loss 7.5305 LearningRate 0.0692 Epoch: 3 Global Step: 41780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:21:02,654-Speed 3323.63 samples/sec Loss 7.4955 LearningRate 0.0692 Epoch: 3 Global Step: 41790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:21:05,717-Speed 3344.62 samples/sec Loss 7.4781 LearningRate 0.0692 Epoch: 3 Global Step: 41800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:21:08,770-Speed 3354.98 samples/sec Loss 7.4758 LearningRate 0.0692 Epoch: 3 Global Step: 41810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:21:11,801-Speed 3379.09 samples/sec Loss 7.5407 LearningRate 0.0692 Epoch: 3 Global Step: 41820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:21:14,858-Speed 3351.42 samples/sec Loss 7.4962 LearningRate 0.0692 Epoch: 3 Global Step: 41830 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:21:17,903-Speed 3364.08 samples/sec Loss 7.5766 LearningRate 0.0692 Epoch: 3 Global Step: 41840 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:21:20,926-Speed 3388.30 samples/sec Loss 7.3091 LearningRate 0.0691 Epoch: 3 Global Step: 41850 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:21:23,978-Speed 3355.82 samples/sec Loss 7.5907 LearningRate 0.0691 Epoch: 3 Global Step: 41860 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:21:27,035-Speed 3350.64 samples/sec Loss 7.3748 LearningRate 0.0691 Epoch: 3 Global Step: 41870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:21:30,108-Speed 3333.67 samples/sec Loss 7.4051 LearningRate 0.0691 Epoch: 3 Global Step: 41880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:21:33,113-Speed 3408.17 samples/sec Loss 7.4119 LearningRate 0.0691 Epoch: 3 Global Step: 41890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:21:36,124-Speed 3402.20 samples/sec Loss 7.5106 LearningRate 0.0691 Epoch: 3 Global Step: 41900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:21:39,157-Speed 3377.78 samples/sec Loss 7.5434 LearningRate 0.0691 Epoch: 3 Global Step: 41910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:21:42,220-Speed 3344.27 samples/sec Loss 7.3854 LearningRate 0.0691 Epoch: 3 Global Step: 41920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:21:45,216-Speed 3418.84 samples/sec Loss 7.5102 LearningRate 0.0691 Epoch: 3 Global Step: 41930 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:21:48,274-Speed 3349.57 samples/sec Loss 7.4649 LearningRate 0.0691 Epoch: 3 Global Step: 41940 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:21:51,380-Speed 3297.79 samples/sec Loss 7.4998 LearningRate 0.0691 Epoch: 3 Global Step: 41950 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:21:54,459-Speed 3327.46 samples/sec Loss 7.4703 LearningRate 0.0691 Epoch: 3 Global Step: 41960 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:21:57,494-Speed 3374.87 samples/sec Loss 7.4084 LearningRate 0.0691 Epoch: 3 Global Step: 41970 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:00,526-Speed 3378.70 samples/sec Loss 7.4678 LearningRate 0.0691 Epoch: 3 Global Step: 41980 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:03,569-Speed 3365.49 samples/sec Loss 7.4546 LearningRate 0.0691 Epoch: 3 Global Step: 41990 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:06,595-Speed 3385.71 samples/sec Loss 7.5350 LearningRate 0.0690 Epoch: 3 Global Step: 42000 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:09,621-Speed 3385.01 samples/sec Loss 7.5193 LearningRate 0.0690 Epoch: 3 Global Step: 42010 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:12,645-Speed 3387.65 samples/sec Loss 7.5254 LearningRate 0.0690 Epoch: 3 Global Step: 42020 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:15,673-Speed 3382.71 samples/sec Loss 7.5178 LearningRate 0.0690 Epoch: 3 Global Step: 42030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:22:18,780-Speed 3296.11 samples/sec Loss 7.5007 LearningRate 0.0690 Epoch: 3 Global Step: 42040 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:22:21,771-Speed 3424.68 samples/sec Loss 7.3767 LearningRate 0.0690 Epoch: 3 Global Step: 42050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:22:24,791-Speed 3391.74 samples/sec Loss 7.5141 LearningRate 0.0690 Epoch: 3 Global Step: 42060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:22:27,793-Speed 3412.54 samples/sec Loss 7.4236 LearningRate 0.0690 Epoch: 3 Global Step: 42070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:22:30,820-Speed 3383.61 samples/sec Loss 7.5506 LearningRate 0.0690 Epoch: 3 Global Step: 42080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:22:33,834-Speed 3398.59 samples/sec Loss 7.4704 LearningRate 0.0690 Epoch: 3 Global Step: 42090 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:36,881-Speed 3362.36 samples/sec Loss 7.5148 LearningRate 0.0690 Epoch: 3 Global Step: 42100 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:39,936-Speed 3352.48 samples/sec Loss 7.4640 LearningRate 0.0690 Epoch: 3 Global Step: 42110 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:43,017-Speed 3325.36 samples/sec Loss 7.5796 LearningRate 0.0690 Epoch: 3 Global Step: 42120 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:46,015-Speed 3416.49 samples/sec Loss 7.4539 LearningRate 0.0690 Epoch: 3 Global Step: 42130 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:49,078-Speed 3343.75 samples/sec Loss 7.5169 LearningRate 0.0690 Epoch: 3 Global Step: 42140 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:52,128-Speed 3359.12 samples/sec Loss 7.4742 LearningRate 0.0689 Epoch: 3 Global Step: 42150 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:55,170-Speed 3367.54 samples/sec Loss 7.5030 LearningRate 0.0689 Epoch: 3 Global Step: 42160 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:22:58,169-Speed 3415.50 samples/sec Loss 7.4319 LearningRate 0.0689 Epoch: 3 Global Step: 42170 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:01,219-Speed 3357.90 samples/sec Loss 7.5019 LearningRate 0.0689 Epoch: 3 Global Step: 42180 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:04,325-Speed 3298.07 samples/sec Loss 7.6340 LearningRate 0.0689 Epoch: 3 Global Step: 42190 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:23:07,379-Speed 3353.83 samples/sec Loss 7.4757 LearningRate 0.0689 Epoch: 3 Global Step: 42200 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:23:10,413-Speed 3378.00 samples/sec Loss 7.5704 LearningRate 0.0689 Epoch: 3 Global Step: 42210 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:23:13,489-Speed 3330.03 samples/sec Loss 7.4261 LearningRate 0.0689 Epoch: 3 Global Step: 42220 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:23:16,580-Speed 3314.37 samples/sec Loss 7.4965 LearningRate 0.0689 Epoch: 3 Global Step: 42230 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:23:19,637-Speed 3350.12 samples/sec Loss 7.6435 LearningRate 0.0689 Epoch: 3 Global Step: 42240 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:23:22,641-Speed 3410.48 samples/sec Loss 7.7089 LearningRate 0.0689 Epoch: 3 Global Step: 42250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:23:25,698-Speed 3351.00 samples/sec Loss 7.6282 LearningRate 0.0689 Epoch: 3 Global Step: 42260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:23:28,735-Speed 3372.36 samples/sec Loss 7.5759 LearningRate 0.0689 Epoch: 3 Global Step: 42270 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:31,744-Speed 3404.43 samples/sec Loss 7.4810 LearningRate 0.0689 Epoch: 3 Global Step: 42280 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:34,781-Speed 3372.37 samples/sec Loss 7.4719 LearningRate 0.0689 Epoch: 3 Global Step: 42290 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:37,785-Speed 3410.16 samples/sec Loss 7.4988 LearningRate 0.0688 Epoch: 3 Global Step: 42300 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:40,801-Speed 3396.68 samples/sec Loss 7.5200 LearningRate 0.0688 Epoch: 3 Global Step: 42310 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:43,824-Speed 3388.76 samples/sec Loss 7.4843 LearningRate 0.0688 Epoch: 3 Global Step: 42320 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:46,833-Speed 3404.28 samples/sec Loss 7.4201 LearningRate 0.0688 Epoch: 3 Global Step: 42330 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:49,899-Speed 3341.29 samples/sec Loss 7.4745 LearningRate 0.0688 Epoch: 3 Global Step: 42340 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:52,947-Speed 3360.75 samples/sec Loss 7.5872 LearningRate 0.0688 Epoch: 3 Global Step: 42350 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:56,041-Speed 3310.63 samples/sec Loss 7.4851 LearningRate 0.0688 Epoch: 3 Global Step: 42360 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:23:59,059-Speed 3393.97 samples/sec Loss 7.4689 LearningRate 0.0688 Epoch: 3 Global Step: 42370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:02,138-Speed 3326.89 samples/sec Loss 7.5792 LearningRate 0.0688 Epoch: 3 Global Step: 42380 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:05,207-Speed 3337.65 samples/sec Loss 7.6051 LearningRate 0.0688 Epoch: 3 Global Step: 42390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:08,266-Speed 3349.02 samples/sec Loss 7.5532 LearningRate 0.0688 Epoch: 3 Global Step: 42400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:11,384-Speed 3284.64 samples/sec Loss 7.4898 LearningRate 0.0688 Epoch: 3 Global Step: 42410 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:14,420-Speed 3373.84 samples/sec Loss 7.4450 LearningRate 0.0688 Epoch: 3 Global Step: 42420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:17,475-Speed 3353.46 samples/sec Loss 7.4640 LearningRate 0.0688 Epoch: 3 Global Step: 42430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:20,520-Speed 3363.51 samples/sec Loss 7.6554 LearningRate 0.0688 Epoch: 3 Global Step: 42440 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:23,614-Speed 3310.61 samples/sec Loss 7.5154 LearningRate 0.0687 Epoch: 3 Global Step: 42450 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:26,736-Speed 3281.32 samples/sec Loss 7.5788 LearningRate 0.0687 Epoch: 3 Global Step: 42460 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:29,785-Speed 3359.42 samples/sec Loss 7.4354 LearningRate 0.0687 Epoch: 3 Global Step: 42470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:24:32,819-Speed 3376.60 samples/sec Loss 7.5154 LearningRate 0.0687 Epoch: 3 Global Step: 42480 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:35,901-Speed 3322.91 samples/sec Loss 7.6188 LearningRate 0.0687 Epoch: 3 Global Step: 42490 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:38,983-Speed 3324.29 samples/sec Loss 7.5036 LearningRate 0.0687 Epoch: 3 Global Step: 42500 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:41,989-Speed 3407.32 samples/sec Loss 7.5568 LearningRate 0.0687 Epoch: 3 Global Step: 42510 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:45,010-Speed 3390.74 samples/sec Loss 7.4792 LearningRate 0.0687 Epoch: 3 Global Step: 42520 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:48,074-Speed 3342.54 samples/sec Loss 7.5445 LearningRate 0.0687 Epoch: 3 Global Step: 42530 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:24:51,153-Speed 3326.50 samples/sec Loss 7.5354 LearningRate 0.0687 Epoch: 3 Global Step: 42540 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:24:54,154-Speed 3413.67 samples/sec Loss 7.5676 LearningRate 0.0687 Epoch: 3 Global Step: 42550 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:24:57,153-Speed 3415.64 samples/sec Loss 7.4735 LearningRate 0.0687 Epoch: 3 Global Step: 42560 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:25:00,214-Speed 3346.51 samples/sec Loss 7.5288 LearningRate 0.0687 Epoch: 3 Global Step: 42570 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:25:03,253-Speed 3370.07 samples/sec Loss 7.5321 LearningRate 0.0687 Epoch: 3 Global Step: 42580 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:25:06,347-Speed 3310.76 samples/sec Loss 7.4417 LearningRate 0.0687 Epoch: 3 Global Step: 42590 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:25:09,354-Speed 3406.94 samples/sec Loss 7.5178 LearningRate 0.0686 Epoch: 3 Global Step: 42600 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:25:12,411-Speed 3350.31 samples/sec Loss 7.4863 LearningRate 0.0686 Epoch: 3 Global Step: 42610 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:25:15,532-Speed 3282.48 samples/sec Loss 7.4909 LearningRate 0.0686 Epoch: 3 Global Step: 42620 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:25:18,615-Speed 3322.52 samples/sec Loss 7.6004 LearningRate 0.0686 Epoch: 3 Global Step: 42630 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:25:21,656-Speed 3368.15 samples/sec Loss 7.4406 LearningRate 0.0686 Epoch: 3 Global Step: 42640 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:25:24,689-Speed 3377.87 samples/sec Loss 7.5081 LearningRate 0.0686 Epoch: 3 Global Step: 42650 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:25:27,714-Speed 3385.24 samples/sec Loss 7.5539 LearningRate 0.0686 Epoch: 3 Global Step: 42660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:25:30,790-Speed 3330.05 samples/sec Loss 7.4574 LearningRate 0.0686 Epoch: 3 Global Step: 42670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:25:33,835-Speed 3364.62 samples/sec Loss 7.5343 LearningRate 0.0686 Epoch: 3 Global Step: 42680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:25:36,855-Speed 3392.03 samples/sec Loss 7.5218 LearningRate 0.0686 Epoch: 3 Global Step: 42690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:25:39,880-Speed 3386.42 samples/sec Loss 7.5252 LearningRate 0.0686 Epoch: 3 Global Step: 42700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:25:42,889-Speed 3403.82 samples/sec Loss 7.5976 LearningRate 0.0686 Epoch: 3 Global Step: 42710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:25:45,913-Speed 3387.42 samples/sec Loss 7.4963 LearningRate 0.0686 Epoch: 3 Global Step: 42720 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:25:48,971-Speed 3349.80 samples/sec Loss 7.5192 LearningRate 0.0686 Epoch: 3 Global Step: 42730 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:25:51,993-Speed 3389.18 samples/sec Loss 7.5498 LearningRate 0.0686 Epoch: 3 Global Step: 42740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:25:55,056-Speed 3344.79 samples/sec Loss 7.4965 LearningRate 0.0685 Epoch: 3 Global Step: 42750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:25:58,064-Speed 3405.46 samples/sec Loss 7.4316 LearningRate 0.0685 Epoch: 3 Global Step: 42760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:26:01,149-Speed 3319.88 samples/sec Loss 7.6222 LearningRate 0.0685 Epoch: 3 Global Step: 42770 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:26:04,268-Speed 3285.02 samples/sec Loss 7.5426 LearningRate 0.0685 Epoch: 3 Global Step: 42780 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:26:07,286-Speed 3393.67 samples/sec Loss 7.4697 LearningRate 0.0685 Epoch: 3 Global Step: 42790 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:26:10,321-Speed 3375.44 samples/sec Loss 7.4910 LearningRate 0.0685 Epoch: 3 Global Step: 42800 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:26:13,340-Speed 3392.18 samples/sec Loss 7.5342 LearningRate 0.0685 Epoch: 3 Global Step: 42810 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:26:16,420-Speed 3326.38 samples/sec Loss 7.6652 LearningRate 0.0685 Epoch: 3 Global Step: 42820 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:26:19,498-Speed 3328.02 samples/sec Loss 7.5759 LearningRate 0.0685 Epoch: 3 Global Step: 42830 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:26:22,577-Speed 3326.91 samples/sec Loss 7.5609 LearningRate 0.0685 Epoch: 3 Global Step: 42840 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:26:25,686-Speed 3295.07 samples/sec Loss 7.5815 LearningRate 0.0685 Epoch: 3 Global Step: 42850 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:26:28,761-Speed 3330.89 samples/sec Loss 7.5720 LearningRate 0.0685 Epoch: 3 Global Step: 42860 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:26:31,804-Speed 3365.86 samples/sec Loss 7.4779 LearningRate 0.0685 Epoch: 3 Global Step: 42870 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:26:34,837-Speed 3377.09 samples/sec Loss 7.5194 LearningRate 0.0685 Epoch: 3 Global Step: 42880 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:26:37,850-Speed 3399.76 samples/sec Loss 7.6501 LearningRate 0.0685 Epoch: 3 Global Step: 42890 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:26:40,851-Speed 3413.92 samples/sec Loss 7.4829 LearningRate 0.0684 Epoch: 3 Global Step: 42900 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:26:43,860-Speed 3403.49 samples/sec Loss 7.3893 LearningRate 0.0684 Epoch: 3 Global Step: 42910 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:26:46,919-Speed 3349.37 samples/sec Loss 7.5291 LearningRate 0.0684 Epoch: 3 Global Step: 42920 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 05:26:49,951-Speed 3378.49 samples/sec Loss 7.5244 LearningRate 0.0684 Epoch: 3 Global Step: 42930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:26:53,045-Speed 3310.75 samples/sec Loss 7.5605 LearningRate 0.0684 Epoch: 3 Global Step: 42940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:26:56,079-Speed 3376.24 samples/sec Loss 7.5806 LearningRate 0.0684 Epoch: 3 Global Step: 42950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:26:59,126-Speed 3361.49 samples/sec Loss 7.5433 LearningRate 0.0684 Epoch: 3 Global Step: 42960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:02,167-Speed 3368.94 samples/sec Loss 7.5127 LearningRate 0.0684 Epoch: 3 Global Step: 42970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:05,251-Speed 3321.28 samples/sec Loss 7.5309 LearningRate 0.0684 Epoch: 3 Global Step: 42980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:08,259-Speed 3405.47 samples/sec Loss 7.5518 LearningRate 0.0684 Epoch: 3 Global Step: 42990 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:11,290-Speed 3378.96 samples/sec Loss 7.4917 LearningRate 0.0684 Epoch: 3 Global Step: 43000 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:14,321-Speed 3380.07 samples/sec Loss 7.6606 LearningRate 0.0684 Epoch: 3 Global Step: 43010 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:17,337-Speed 3395.39 samples/sec Loss 7.6299 LearningRate 0.0684 Epoch: 3 Global Step: 43020 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:20,358-Speed 3390.77 samples/sec Loss 7.5729 LearningRate 0.0684 Epoch: 3 Global Step: 43030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:27:23,379-Speed 3390.52 samples/sec Loss 7.5709 LearningRate 0.0684 Epoch: 3 Global Step: 43040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:27:26,487-Speed 3296.17 samples/sec Loss 7.5631 LearningRate 0.0683 Epoch: 3 Global Step: 43050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:27:29,614-Speed 3275.53 samples/sec Loss 7.5579 LearningRate 0.0683 Epoch: 3 Global Step: 43060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:27:32,703-Speed 3316.49 samples/sec Loss 7.3998 LearningRate 0.0683 Epoch: 3 Global Step: 43070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:27:35,739-Speed 3374.59 samples/sec Loss 7.6504 LearningRate 0.0683 Epoch: 3 Global Step: 43080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:38,882-Speed 3258.90 samples/sec Loss 7.4927 LearningRate 0.0683 Epoch: 3 Global Step: 43090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:41,929-Speed 3361.68 samples/sec Loss 7.4997 LearningRate 0.0683 Epoch: 3 Global Step: 43100 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:44,972-Speed 3366.33 samples/sec Loss 7.4496 LearningRate 0.0683 Epoch: 3 Global Step: 43110 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:48,029-Speed 3350.55 samples/sec Loss 7.4922 LearningRate 0.0683 Epoch: 3 Global Step: 43120 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:51,063-Speed 3376.14 samples/sec Loss 7.4629 LearningRate 0.0683 Epoch: 3 Global Step: 43130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:54,104-Speed 3368.21 samples/sec Loss 7.4847 LearningRate 0.0683 Epoch: 3 Global Step: 43140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:27:57,117-Speed 3399.38 samples/sec Loss 7.5906 LearningRate 0.0683 Epoch: 3 Global Step: 43150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:00,167-Speed 3358.42 samples/sec Loss 7.6392 LearningRate 0.0683 Epoch: 3 Global Step: 43160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:03,249-Speed 3323.96 samples/sec Loss 7.5537 LearningRate 0.0683 Epoch: 3 Global Step: 43170 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:06,277-Speed 3383.01 samples/sec Loss 7.6076 LearningRate 0.0683 Epoch: 3 Global Step: 43180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:28:09,297-Speed 3391.11 samples/sec Loss 7.5080 LearningRate 0.0683 Epoch: 3 Global Step: 43190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:28:12,352-Speed 3353.74 samples/sec Loss 7.4694 LearningRate 0.0682 Epoch: 3 Global Step: 43200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:28:15,366-Speed 3398.51 samples/sec Loss 7.4732 LearningRate 0.0682 Epoch: 3 Global Step: 43210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:28:18,451-Speed 3320.56 samples/sec Loss 7.5338 LearningRate 0.0682 Epoch: 3 Global Step: 43220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:28:21,472-Speed 3390.91 samples/sec Loss 7.5721 LearningRate 0.0682 Epoch: 3 Global Step: 43230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:28:24,522-Speed 3358.08 samples/sec Loss 7.5162 LearningRate 0.0682 Epoch: 3 Global Step: 43240 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:27,538-Speed 3396.30 samples/sec Loss 7.5760 LearningRate 0.0682 Epoch: 3 Global Step: 43250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:30,630-Speed 3313.19 samples/sec Loss 7.5699 LearningRate 0.0682 Epoch: 3 Global Step: 43260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:33,652-Speed 3389.15 samples/sec Loss 7.5879 LearningRate 0.0682 Epoch: 3 Global Step: 43270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:36,711-Speed 3349.34 samples/sec Loss 7.4128 LearningRate 0.0682 Epoch: 3 Global Step: 43280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:39,810-Speed 3305.62 samples/sec Loss 7.4950 LearningRate 0.0682 Epoch: 3 Global Step: 43290 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:42,909-Speed 3304.94 samples/sec Loss 7.5101 LearningRate 0.0682 Epoch: 3 Global Step: 43300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:45,923-Speed 3398.82 samples/sec Loss 7.5845 LearningRate 0.0682 Epoch: 3 Global Step: 43310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:48,942-Speed 3392.43 samples/sec Loss 7.5724 LearningRate 0.0682 Epoch: 3 Global Step: 43320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:51,969-Speed 3384.81 samples/sec Loss 7.5491 LearningRate 0.0682 Epoch: 3 Global Step: 43330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:28:55,016-Speed 3361.82 samples/sec Loss 7.5012 LearningRate 0.0682 Epoch: 3 Global Step: 43340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 05:28:58,052-Speed 3373.05 samples/sec Loss 7.5601 LearningRate 0.0681 Epoch: 3 Global Step: 43350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:29:01,114-Speed 3345.73 samples/sec Loss 7.5477 LearningRate 0.0681 Epoch: 3 Global Step: 43360 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:29:04,135-Speed 3390.57 samples/sec Loss 7.5570 LearningRate 0.0681 Epoch: 3 Global Step: 43370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:29:07,213-Speed 3327.96 samples/sec Loss 7.5038 LearningRate 0.0681 Epoch: 3 Global Step: 43380 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:29:10,235-Speed 3389.59 samples/sec Loss 7.5064 LearningRate 0.0681 Epoch: 3 Global Step: 43390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:29:13,279-Speed 3364.41 samples/sec Loss 7.5417 LearningRate 0.0681 Epoch: 3 Global Step: 43400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:29:16,374-Speed 3310.39 samples/sec Loss 7.4854 LearningRate 0.0681 Epoch: 3 Global Step: 43410 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:29:19,416-Speed 3367.05 samples/sec Loss 7.3955 LearningRate 0.0681 Epoch: 3 Global Step: 43420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:29:22,453-Speed 3373.15 samples/sec Loss 7.4234 LearningRate 0.0681 Epoch: 3 Global Step: 43430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:29:25,548-Speed 3308.78 samples/sec Loss 7.7104 LearningRate 0.0681 Epoch: 3 Global Step: 43440 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 05:29:28,610-Speed 3345.47 samples/sec Loss 7.5528 LearningRate 0.0681 Epoch: 3 Global Step: 43450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:29:31,643-Speed 3378.00 samples/sec Loss 7.4994 LearningRate 0.0681 Epoch: 3 Global Step: 43460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:29:34,683-Speed 3368.73 samples/sec Loss 7.5214 LearningRate 0.0681 Epoch: 3 Global Step: 43470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:29:37,743-Speed 3347.83 samples/sec Loss 7.4671 LearningRate 0.0681 Epoch: 3 Global Step: 43480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:29:40,802-Speed 3348.69 samples/sec Loss 7.6484 LearningRate 0.0681 Epoch: 3 Global Step: 43490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:29:43,886-Speed 3321.28 samples/sec Loss 7.4660 LearningRate 0.0680 Epoch: 3 Global Step: 43500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:29:46,937-Speed 3357.13 samples/sec Loss 7.5448 LearningRate 0.0680 Epoch: 3 Global Step: 43510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:29:49,953-Speed 3396.99 samples/sec Loss 7.5821 LearningRate 0.0680 Epoch: 3 Global Step: 43520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:29:53,082-Speed 3272.81 samples/sec Loss 7.5336 LearningRate 0.0680 Epoch: 3 Global Step: 43530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:29:56,126-Speed 3365.66 samples/sec Loss 7.5939 LearningRate 0.0680 Epoch: 3 Global Step: 43540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:29:59,240-Speed 3289.40 samples/sec Loss 7.5701 LearningRate 0.0680 Epoch: 3 Global Step: 43550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:02,304-Speed 3343.05 samples/sec Loss 7.5295 LearningRate 0.0680 Epoch: 3 Global Step: 43560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:05,359-Speed 3352.81 samples/sec Loss 7.6872 LearningRate 0.0680 Epoch: 3 Global Step: 43570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:08,371-Speed 3400.55 samples/sec Loss 7.5629 LearningRate 0.0680 Epoch: 3 Global Step: 43580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:30:11,412-Speed 3368.81 samples/sec Loss 7.6270 LearningRate 0.0680 Epoch: 3 Global Step: 43590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:30:14,485-Speed 3333.35 samples/sec Loss 7.5771 LearningRate 0.0680 Epoch: 3 Global Step: 43600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:30:17,558-Speed 3332.86 samples/sec Loss 7.5731 LearningRate 0.0680 Epoch: 3 Global Step: 43610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:30:20,579-Speed 3390.88 samples/sec Loss 7.6183 LearningRate 0.0680 Epoch: 3 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:30:23,643-Speed 3343.87 samples/sec Loss 7.5901 LearningRate 0.0680 Epoch: 3 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:30:26,698-Speed 3352.76 samples/sec Loss 7.5562 LearningRate 0.0680 Epoch: 3 Global Step: 43640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:30:29,725-Speed 3383.34 samples/sec Loss 7.5379 LearningRate 0.0679 Epoch: 3 Global Step: 43650 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:32,793-Speed 3338.75 samples/sec Loss 7.5188 LearningRate 0.0679 Epoch: 3 Global Step: 43660 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:35,877-Speed 3321.52 samples/sec Loss 7.4813 LearningRate 0.0679 Epoch: 3 Global Step: 43670 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:38,969-Speed 3313.09 samples/sec Loss 7.5049 LearningRate 0.0679 Epoch: 3 Global Step: 43680 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:42,038-Speed 3336.90 samples/sec Loss 7.5331 LearningRate 0.0679 Epoch: 3 Global Step: 43690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:45,054-Speed 3396.60 samples/sec Loss 7.5383 LearningRate 0.0679 Epoch: 3 Global Step: 43700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:48,093-Speed 3370.46 samples/sec Loss 7.6116 LearningRate 0.0679 Epoch: 3 Global Step: 43710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:51,158-Speed 3342.34 samples/sec Loss 7.5785 LearningRate 0.0679 Epoch: 3 Global Step: 43720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:54,219-Speed 3346.03 samples/sec Loss 7.5950 LearningRate 0.0679 Epoch: 3 Global Step: 43730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:30:57,268-Speed 3359.54 samples/sec Loss 7.5160 LearningRate 0.0679 Epoch: 3 Global Step: 43740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:31:00,297-Speed 3381.98 samples/sec Loss 7.6121 LearningRate 0.0679 Epoch: 3 Global Step: 43750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:31:03,341-Speed 3364.93 samples/sec Loss 7.5876 LearningRate 0.0679 Epoch: 3 Global Step: 43760 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:31:06,369-Speed 3383.27 samples/sec Loss 7.4856 LearningRate 0.0679 Epoch: 3 Global Step: 43770 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:31:09,398-Speed 3382.00 samples/sec Loss 7.5642 LearningRate 0.0679 Epoch: 3 Global Step: 43780 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:31:12,442-Speed 3365.09 samples/sec Loss 7.6148 LearningRate 0.0679 Epoch: 3 Global Step: 43790 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:31:15,461-Speed 3392.59 samples/sec Loss 7.4620 LearningRate 0.0678 Epoch: 3 Global Step: 43800 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:31:18,505-Speed 3365.79 samples/sec Loss 7.5393 LearningRate 0.0678 Epoch: 3 Global Step: 43810 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:31:21,561-Speed 3351.49 samples/sec Loss 7.5178 LearningRate 0.0678 Epoch: 3 Global Step: 43820 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:31:24,612-Speed 3356.78 samples/sec Loss 7.4997 LearningRate 0.0678 Epoch: 3 Global Step: 43830 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:31:27,666-Speed 3353.90 samples/sec Loss 7.4732 LearningRate 0.0678 Epoch: 3 Global Step: 43840 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:31:30,711-Speed 3365.04 samples/sec Loss 7.4887 LearningRate 0.0678 Epoch: 3 Global Step: 43850 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:31:33,752-Speed 3368.73 samples/sec Loss 7.5241 LearningRate 0.0678 Epoch: 3 Global Step: 43860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:31:36,765-Speed 3398.97 samples/sec Loss 7.5225 LearningRate 0.0678 Epoch: 3 Global Step: 43870 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:31:39,823-Speed 3349.35 samples/sec Loss 7.5199 LearningRate 0.0678 Epoch: 3 Global Step: 43880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:31:42,921-Speed 3306.75 samples/sec Loss 7.4490 LearningRate 0.0678 Epoch: 3 Global Step: 43890 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:31:45,935-Speed 3399.36 samples/sec Loss 7.5086 LearningRate 0.0678 Epoch: 3 Global Step: 43900 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:31:48,967-Speed 3377.99 samples/sec Loss 7.5765 LearningRate 0.0678 Epoch: 3 Global Step: 43910 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:31:51,996-Speed 3381.88 samples/sec Loss 7.5249 LearningRate 0.0678 Epoch: 3 Global Step: 43920 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:31:55,041-Speed 3364.25 samples/sec Loss 7.6154 LearningRate 0.0678 Epoch: 3 Global Step: 43930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:31:58,052-Speed 3401.44 samples/sec Loss 7.4834 LearningRate 0.0678 Epoch: 3 Global Step: 43940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:01,112-Speed 3347.45 samples/sec Loss 7.4941 LearningRate 0.0677 Epoch: 3 Global Step: 43950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:04,126-Speed 3397.87 samples/sec Loss 7.5038 LearningRate 0.0677 Epoch: 3 Global Step: 43960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:32:07,176-Speed 3359.06 samples/sec Loss 7.5279 LearningRate 0.0677 Epoch: 3 Global Step: 43970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:32:10,201-Speed 3385.84 samples/sec Loss 7.6262 LearningRate 0.0677 Epoch: 3 Global Step: 43980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:32:13,337-Speed 3266.57 samples/sec Loss 7.6402 LearningRate 0.0677 Epoch: 3 Global Step: 43990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:32:16,348-Speed 3401.42 samples/sec Loss 7.5196 LearningRate 0.0677 Epoch: 3 Global Step: 44000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:19,357-Speed 3404.19 samples/sec Loss 7.6270 LearningRate 0.0677 Epoch: 3 Global Step: 44010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:22,410-Speed 3355.83 samples/sec Loss 7.5288 LearningRate 0.0677 Epoch: 3 Global Step: 44020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:25,498-Speed 3317.15 samples/sec Loss 7.6149 LearningRate 0.0677 Epoch: 3 Global Step: 44030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:28,543-Speed 3364.17 samples/sec Loss 7.5470 LearningRate 0.0677 Epoch: 3 Global Step: 44040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:31,604-Speed 3345.81 samples/sec Loss 7.5260 LearningRate 0.0677 Epoch: 3 Global Step: 44050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:34,632-Speed 3383.35 samples/sec Loss 7.6378 LearningRate 0.0677 Epoch: 3 Global Step: 44060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:37,650-Speed 3393.77 samples/sec Loss 7.5369 LearningRate 0.0677 Epoch: 3 Global Step: 44070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:40,704-Speed 3354.56 samples/sec Loss 7.4682 LearningRate 0.0677 Epoch: 3 Global Step: 44080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:43,719-Speed 3396.74 samples/sec Loss 7.5495 LearningRate 0.0677 Epoch: 3 Global Step: 44090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:46,750-Speed 3380.05 samples/sec Loss 7.5406 LearningRate 0.0676 Epoch: 3 Global Step: 44100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:49,764-Speed 3398.22 samples/sec Loss 7.4209 LearningRate 0.0676 Epoch: 3 Global Step: 44110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:52,821-Speed 3351.46 samples/sec Loss 7.4839 LearningRate 0.0676 Epoch: 3 Global Step: 44120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:55,859-Speed 3371.75 samples/sec Loss 7.5148 LearningRate 0.0676 Epoch: 3 Global Step: 44130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:32:58,969-Speed 3293.65 samples/sec Loss 7.5084 LearningRate 0.0676 Epoch: 3 Global Step: 44140 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:33:02,101-Speed 3270.22 samples/sec Loss 7.5866 LearningRate 0.0676 Epoch: 3 Global Step: 44150 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:33:05,215-Speed 3289.00 samples/sec Loss 7.5690 LearningRate 0.0676 Epoch: 3 Global Step: 44160 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:33:08,228-Speed 3400.54 samples/sec Loss 7.4788 LearningRate 0.0676 Epoch: 3 Global Step: 44170 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:33:11,226-Speed 3416.37 samples/sec Loss 7.5793 LearningRate 0.0676 Epoch: 3 Global Step: 44180 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:33:14,251-Speed 3386.54 samples/sec Loss 7.5597 LearningRate 0.0676 Epoch: 3 Global Step: 44190 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:33:17,303-Speed 3356.27 samples/sec Loss 7.6253 LearningRate 0.0676 Epoch: 3 Global Step: 44200 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:33:20,335-Speed 3378.02 samples/sec Loss 7.5875 LearningRate 0.0676 Epoch: 3 Global Step: 44210 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:33:23,344-Speed 3404.20 samples/sec Loss 7.5225 LearningRate 0.0676 Epoch: 3 Global Step: 44220 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:33:26,398-Speed 3354.39 samples/sec Loss 7.4267 LearningRate 0.0676 Epoch: 3 Global Step: 44230 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:33:29,446-Speed 3360.81 samples/sec Loss 7.6105 LearningRate 0.0676 Epoch: 3 Global Step: 44240 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:33:32,526-Speed 3325.46 samples/sec Loss 7.5940 LearningRate 0.0675 Epoch: 3 Global Step: 44250 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:33:35,580-Speed 3354.19 samples/sec Loss 7.6105 LearningRate 0.0675 Epoch: 3 Global Step: 44260 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:33:38,616-Speed 3373.79 samples/sec Loss 7.5574 LearningRate 0.0675 Epoch: 3 Global Step: 44270 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:33:41,672-Speed 3352.53 samples/sec Loss 7.5707 LearningRate 0.0675 Epoch: 3 Global Step: 44280 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:33:44,673-Speed 3412.90 samples/sec Loss 7.4614 LearningRate 0.0675 Epoch: 3 Global Step: 44290 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:33:47,689-Speed 3396.19 samples/sec Loss 7.5753 LearningRate 0.0675 Epoch: 3 Global Step: 44300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:33:50,684-Speed 3420.28 samples/sec Loss 7.5770 LearningRate 0.0675 Epoch: 3 Global Step: 44310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:33:53,721-Speed 3372.07 samples/sec Loss 7.5363 LearningRate 0.0675 Epoch: 3 Global Step: 44320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:33:56,735-Speed 3399.36 samples/sec Loss 7.5200 LearningRate 0.0675 Epoch: 3 Global Step: 44330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:33:59,783-Speed 3360.66 samples/sec Loss 7.6177 LearningRate 0.0675 Epoch: 3 Global Step: 44340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:34:02,844-Speed 3346.75 samples/sec Loss 7.5872 LearningRate 0.0675 Epoch: 3 Global Step: 44350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:34:05,863-Speed 3392.93 samples/sec Loss 7.5800 LearningRate 0.0675 Epoch: 3 Global Step: 44360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:34:08,859-Speed 3418.86 samples/sec Loss 7.5623 LearningRate 0.0675 Epoch: 3 Global Step: 44370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:34:11,882-Speed 3388.17 samples/sec Loss 7.5357 LearningRate 0.0675 Epoch: 3 Global Step: 44380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:34:14,965-Speed 3323.20 samples/sec Loss 7.6262 LearningRate 0.0675 Epoch: 3 Global Step: 44390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:34:17,999-Speed 3375.73 samples/sec Loss 7.5603 LearningRate 0.0674 Epoch: 3 Global Step: 44400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:34:21,006-Speed 3406.14 samples/sec Loss 7.6434 LearningRate 0.0674 Epoch: 3 Global Step: 44410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:34:24,017-Speed 3402.04 samples/sec Loss 7.5603 LearningRate 0.0674 Epoch: 3 Global Step: 44420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:34:27,016-Speed 3416.15 samples/sec Loss 7.7226 LearningRate 0.0674 Epoch: 3 Global Step: 44430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:34:30,057-Speed 3368.21 samples/sec Loss 7.4346 LearningRate 0.0674 Epoch: 3 Global Step: 44440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:34:33,057-Speed 3414.10 samples/sec Loss 7.4513 LearningRate 0.0674 Epoch: 3 Global Step: 44450 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:34:36,106-Speed 3360.23 samples/sec Loss 7.5224 LearningRate 0.0674 Epoch: 3 Global Step: 44460 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:34:39,134-Speed 3382.62 samples/sec Loss 7.6057 LearningRate 0.0674 Epoch: 3 Global Step: 44470 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:34:42,164-Speed 3380.54 samples/sec Loss 7.4901 LearningRate 0.0674 Epoch: 3 Global Step: 44480 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:34:45,191-Speed 3383.94 samples/sec Loss 7.4916 LearningRate 0.0674 Epoch: 3 Global Step: 44490 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:34:48,245-Speed 3354.30 samples/sec Loss 7.6102 LearningRate 0.0674 Epoch: 3 Global Step: 44500 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:34:51,252-Speed 3406.50 samples/sec Loss 7.5165 LearningRate 0.0674 Epoch: 3 Global Step: 44510 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:34:54,317-Speed 3342.22 samples/sec Loss 7.6631 LearningRate 0.0674 Epoch: 3 Global Step: 44520 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:34:57,367-Speed 3358.59 samples/sec Loss 7.5373 LearningRate 0.0674 Epoch: 3 Global Step: 44530 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:35:00,379-Speed 3401.14 samples/sec Loss 7.3804 LearningRate 0.0674 Epoch: 3 Global Step: 44540 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:35:03,451-Speed 3333.76 samples/sec Loss 7.5254 LearningRate 0.0673 Epoch: 3 Global Step: 44550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:35:06,486-Speed 3375.91 samples/sec Loss 7.6622 LearningRate 0.0673 Epoch: 3 Global Step: 44560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:35:09,495-Speed 3403.23 samples/sec Loss 7.6247 LearningRate 0.0673 Epoch: 3 Global Step: 44570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:35:12,494-Speed 3416.24 samples/sec Loss 7.5662 LearningRate 0.0673 Epoch: 3 Global Step: 44580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:35:15,505-Speed 3402.57 samples/sec Loss 7.6390 LearningRate 0.0673 Epoch: 3 Global Step: 44590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:35:18,512-Speed 3405.73 samples/sec Loss 7.5659 LearningRate 0.0673 Epoch: 3 Global Step: 44600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:35:21,518-Speed 3408.14 samples/sec Loss 7.4225 LearningRate 0.0673 Epoch: 3 Global Step: 44610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:35:24,578-Speed 3347.39 samples/sec Loss 7.6148 LearningRate 0.0673 Epoch: 3 Global Step: 44620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:35:27,575-Speed 3417.75 samples/sec Loss 7.5134 LearningRate 0.0673 Epoch: 3 Global Step: 44630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:35:30,612-Speed 3373.60 samples/sec Loss 7.5330 LearningRate 0.0673 Epoch: 3 Global Step: 44640 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:35:33,617-Speed 3408.36 samples/sec Loss 7.5746 LearningRate 0.0673 Epoch: 3 Global Step: 44650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:35:36,642-Speed 3385.97 samples/sec Loss 7.5459 LearningRate 0.0673 Epoch: 3 Global Step: 44660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:35:39,689-Speed 3361.51 samples/sec Loss 7.6299 LearningRate 0.0673 Epoch: 3 Global Step: 44670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:35:42,738-Speed 3359.84 samples/sec Loss 7.5539 LearningRate 0.0673 Epoch: 3 Global Step: 44680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:35:45,728-Speed 3425.72 samples/sec Loss 7.6051 LearningRate 0.0673 Epoch: 3 Global Step: 44690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:35:48,727-Speed 3415.20 samples/sec Loss 7.6377 LearningRate 0.0673 Epoch: 3 Global Step: 44700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:35:51,789-Speed 3345.73 samples/sec Loss 7.4449 LearningRate 0.0672 Epoch: 3 Global Step: 44710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:35:54,855-Speed 3341.34 samples/sec Loss 7.4735 LearningRate 0.0672 Epoch: 3 Global Step: 44720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:35:57,855-Speed 3413.87 samples/sec Loss 7.4337 LearningRate 0.0672 Epoch: 3 Global Step: 44730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:36:00,880-Speed 3386.05 samples/sec Loss 7.5328 LearningRate 0.0672 Epoch: 3 Global Step: 44740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:03,942-Speed 3345.08 samples/sec Loss 7.6999 LearningRate 0.0672 Epoch: 3 Global Step: 44750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:07,049-Speed 3297.80 samples/sec Loss 7.6829 LearningRate 0.0672 Epoch: 3 Global Step: 44760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:10,059-Speed 3402.63 samples/sec Loss 7.6161 LearningRate 0.0672 Epoch: 3 Global Step: 44770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:13,066-Speed 3407.02 samples/sec Loss 7.6223 LearningRate 0.0672 Epoch: 3 Global Step: 44780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:16,127-Speed 3346.73 samples/sec Loss 7.4270 LearningRate 0.0672 Epoch: 3 Global Step: 44790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:19,152-Speed 3385.50 samples/sec Loss 7.6081 LearningRate 0.0672 Epoch: 3 Global Step: 44800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:22,168-Speed 3396.13 samples/sec Loss 7.5573 LearningRate 0.0672 Epoch: 3 Global Step: 44810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:25,197-Speed 3381.88 samples/sec Loss 7.5093 LearningRate 0.0672 Epoch: 3 Global Step: 44820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:28,263-Speed 3341.52 samples/sec Loss 7.5208 LearningRate 0.0672 Epoch: 3 Global Step: 44830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:31,311-Speed 3359.76 samples/sec Loss 7.5335 LearningRate 0.0672 Epoch: 3 Global Step: 44840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:36:34,330-Speed 3393.35 samples/sec Loss 7.5072 LearningRate 0.0672 Epoch: 3 Global Step: 44850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:36:37,364-Speed 3377.18 samples/sec Loss 7.5011 LearningRate 0.0671 Epoch: 3 Global Step: 44860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:40,459-Speed 3308.99 samples/sec Loss 7.5673 LearningRate 0.0671 Epoch: 3 Global Step: 44870 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:43,484-Speed 3386.40 samples/sec Loss 7.6352 LearningRate 0.0671 Epoch: 3 Global Step: 44880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:46,490-Speed 3407.86 samples/sec Loss 7.5980 LearningRate 0.0671 Epoch: 3 Global Step: 44890 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:49,525-Speed 3374.81 samples/sec Loss 7.5939 LearningRate 0.0671 Epoch: 3 Global Step: 44900 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:52,611-Speed 3318.85 samples/sec Loss 7.5357 LearningRate 0.0671 Epoch: 3 Global Step: 44910 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:55,638-Speed 3385.29 samples/sec Loss 7.5911 LearningRate 0.0671 Epoch: 3 Global Step: 44920 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:36:58,687-Speed 3358.62 samples/sec Loss 7.5362 LearningRate 0.0671 Epoch: 3 Global Step: 44930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:01,711-Speed 3387.73 samples/sec Loss 7.4876 LearningRate 0.0671 Epoch: 3 Global Step: 44940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:04,828-Speed 3285.94 samples/sec Loss 7.6033 LearningRate 0.0671 Epoch: 3 Global Step: 44950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:07,869-Speed 3368.71 samples/sec Loss 7.5140 LearningRate 0.0671 Epoch: 3 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:37:10,965-Speed 3308.46 samples/sec Loss 7.4884 LearningRate 0.0671 Epoch: 3 Global Step: 44970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:37:14,027-Speed 3345.23 samples/sec Loss 7.5557 LearningRate 0.0671 Epoch: 3 Global Step: 44980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:17,056-Speed 3381.61 samples/sec Loss 7.5106 LearningRate 0.0671 Epoch: 3 Global Step: 44990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:20,070-Speed 3399.11 samples/sec Loss 7.5047 LearningRate 0.0671 Epoch: 3 Global Step: 45000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:23,093-Speed 3388.36 samples/sec Loss 7.6723 LearningRate 0.0670 Epoch: 3 Global Step: 45010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:26,163-Speed 3335.89 samples/sec Loss 7.5506 LearningRate 0.0670 Epoch: 3 Global Step: 45020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:29,219-Speed 3352.82 samples/sec Loss 7.6022 LearningRate 0.0670 Epoch: 3 Global Step: 45030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:32,233-Speed 3397.95 samples/sec Loss 7.5456 LearningRate 0.0670 Epoch: 3 Global Step: 45040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:35,288-Speed 3352.92 samples/sec Loss 7.5029 LearningRate 0.0670 Epoch: 3 Global Step: 45050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:38,321-Speed 3377.40 samples/sec Loss 7.5705 LearningRate 0.0670 Epoch: 3 Global Step: 45060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:41,368-Speed 3361.65 samples/sec Loss 7.5523 LearningRate 0.0670 Epoch: 3 Global Step: 45070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:44,366-Speed 3416.80 samples/sec Loss 7.5692 LearningRate 0.0670 Epoch: 3 Global Step: 45080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:37:47,368-Speed 3412.42 samples/sec Loss 7.4516 LearningRate 0.0670 Epoch: 3 Global Step: 45090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:37:50,401-Speed 3377.67 samples/sec Loss 7.5485 LearningRate 0.0670 Epoch: 3 Global Step: 45100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:37:53,448-Speed 3362.32 samples/sec Loss 7.6283 LearningRate 0.0670 Epoch: 3 Global Step: 45110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:56,468-Speed 3390.79 samples/sec Loss 7.5638 LearningRate 0.0670 Epoch: 3 Global Step: 45120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:37:59,485-Speed 3395.51 samples/sec Loss 7.6041 LearningRate 0.0670 Epoch: 3 Global Step: 45130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:38:02,510-Speed 3385.94 samples/sec Loss 7.5230 LearningRate 0.0670 Epoch: 3 Global Step: 45140 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:38:05,557-Speed 3362.48 samples/sec Loss 7.5237 LearningRate 0.0670 Epoch: 3 Global Step: 45150 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:38:08,553-Speed 3418.09 samples/sec Loss 7.4670 LearningRate 0.0669 Epoch: 3 Global Step: 45160 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:38:11,547-Speed 3421.69 samples/sec Loss 7.5292 LearningRate 0.0669 Epoch: 3 Global Step: 45170 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:38:14,561-Speed 3398.43 samples/sec Loss 7.6215 LearningRate 0.0669 Epoch: 3 Global Step: 45180 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:38:17,555-Speed 3421.97 samples/sec Loss 7.5359 LearningRate 0.0669 Epoch: 3 Global Step: 45190 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:38:20,589-Speed 3376.20 samples/sec Loss 7.5676 LearningRate 0.0669 Epoch: 3 Global Step: 45200 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:38:23,638-Speed 3358.85 samples/sec Loss 7.5588 LearningRate 0.0669 Epoch: 3 Global Step: 45210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:38:26,657-Speed 3393.08 samples/sec Loss 7.6187 LearningRate 0.0669 Epoch: 3 Global Step: 45220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:38:29,718-Speed 3346.85 samples/sec Loss 7.6260 LearningRate 0.0669 Epoch: 3 Global Step: 45230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:38:32,721-Speed 3411.06 samples/sec Loss 7.5811 LearningRate 0.0669 Epoch: 3 Global Step: 45240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:38:35,788-Speed 3340.13 samples/sec Loss 7.5844 LearningRate 0.0669 Epoch: 3 Global Step: 45250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:38:38,792-Speed 3408.84 samples/sec Loss 7.5784 LearningRate 0.0669 Epoch: 3 Global Step: 45260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:38:41,834-Speed 3367.17 samples/sec Loss 7.6910 LearningRate 0.0669 Epoch: 3 Global Step: 45270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:38:44,889-Speed 3353.82 samples/sec Loss 7.5063 LearningRate 0.0669 Epoch: 3 Global Step: 45280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:38:47,999-Speed 3294.19 samples/sec Loss 7.5262 LearningRate 0.0669 Epoch: 3 Global Step: 45290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:38:51,120-Speed 3281.73 samples/sec Loss 7.5651 LearningRate 0.0669 Epoch: 3 Global Step: 45300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:38:54,176-Speed 3351.98 samples/sec Loss 7.4743 LearningRate 0.0668 Epoch: 3 Global Step: 45310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:38:57,204-Speed 3382.87 samples/sec Loss 7.5418 LearningRate 0.0668 Epoch: 3 Global Step: 45320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:39:00,216-Speed 3400.70 samples/sec Loss 7.6053 LearningRate 0.0668 Epoch: 3 Global Step: 45330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:39:03,284-Speed 3338.05 samples/sec Loss 7.5198 LearningRate 0.0668 Epoch: 3 Global Step: 45340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:39:06,309-Speed 3387.17 samples/sec Loss 7.5082 LearningRate 0.0668 Epoch: 3 Global Step: 45350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:39:09,330-Speed 3390.03 samples/sec Loss 7.5299 LearningRate 0.0668 Epoch: 3 Global Step: 45360 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:39:12,439-Speed 3294.97 samples/sec Loss 7.5228 LearningRate 0.0668 Epoch: 3 Global Step: 45370 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:39:15,536-Speed 3307.62 samples/sec Loss 7.6662 LearningRate 0.0668 Epoch: 3 Global Step: 45380 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:39:18,603-Speed 3340.08 samples/sec Loss 7.5568 LearningRate 0.0668 Epoch: 3 Global Step: 45390 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:39:21,623-Speed 3391.89 samples/sec Loss 7.5380 LearningRate 0.0668 Epoch: 3 Global Step: 45400 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:39:24,671-Speed 3361.26 samples/sec Loss 7.4940 LearningRate 0.0668 Epoch: 3 Global Step: 45410 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:39:27,705-Speed 3376.51 samples/sec Loss 7.6552 LearningRate 0.0668 Epoch: 3 Global Step: 45420 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:39:30,794-Speed 3314.88 samples/sec Loss 7.4068 LearningRate 0.0668 Epoch: 3 Global Step: 45430 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:39:33,850-Speed 3352.03 samples/sec Loss 7.5172 LearningRate 0.0668 Epoch: 3 Global Step: 45440 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:39:36,888-Speed 3372.13 samples/sec Loss 7.5522 LearningRate 0.0668 Epoch: 3 Global Step: 45450 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:39:39,953-Speed 3341.57 samples/sec Loss 7.6989 LearningRate 0.0667 Epoch: 3 Global Step: 45460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:39:42,981-Speed 3383.23 samples/sec Loss 7.4533 LearningRate 0.0667 Epoch: 3 Global Step: 45470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:39:45,990-Speed 3404.98 samples/sec Loss 7.4122 LearningRate 0.0667 Epoch: 3 Global Step: 45480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:39:49,039-Speed 3359.32 samples/sec Loss 7.5921 LearningRate 0.0667 Epoch: 3 Global Step: 45490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:39:52,075-Speed 3374.47 samples/sec Loss 7.4472 LearningRate 0.0667 Epoch: 3 Global Step: 45500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:39:55,150-Speed 3331.91 samples/sec Loss 7.6043 LearningRate 0.0667 Epoch: 3 Global Step: 45510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:39:58,189-Speed 3370.63 samples/sec Loss 7.6322 LearningRate 0.0667 Epoch: 3 Global Step: 45520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:01,214-Speed 3386.27 samples/sec Loss 7.3948 LearningRate 0.0667 Epoch: 3 Global Step: 45530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:04,227-Speed 3399.59 samples/sec Loss 7.5449 LearningRate 0.0667 Epoch: 3 Global Step: 45540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:07,282-Speed 3352.86 samples/sec Loss 7.5602 LearningRate 0.0667 Epoch: 3 Global Step: 45550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:10,345-Speed 3344.28 samples/sec Loss 7.4791 LearningRate 0.0667 Epoch: 3 Global Step: 45560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:40:13,387-Speed 3367.28 samples/sec Loss 7.4613 LearningRate 0.0667 Epoch: 3 Global Step: 45570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:16,454-Speed 3339.66 samples/sec Loss 7.5665 LearningRate 0.0667 Epoch: 3 Global Step: 45580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:19,546-Speed 3312.79 samples/sec Loss 7.5102 LearningRate 0.0667 Epoch: 3 Global Step: 45590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:22,599-Speed 3355.18 samples/sec Loss 7.5249 LearningRate 0.0667 Epoch: 3 Global Step: 45600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:25,620-Speed 3390.84 samples/sec Loss 7.5539 LearningRate 0.0667 Epoch: 3 Global Step: 45610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:28,661-Speed 3368.57 samples/sec Loss 7.4630 LearningRate 0.0666 Epoch: 3 Global Step: 45620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:31,686-Speed 3387.21 samples/sec Loss 7.5897 LearningRate 0.0666 Epoch: 3 Global Step: 45630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:34,696-Speed 3402.82 samples/sec Loss 7.5972 LearningRate 0.0666 Epoch: 3 Global Step: 45640 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:37,730-Speed 3375.99 samples/sec Loss 7.5583 LearningRate 0.0666 Epoch: 3 Global Step: 45650 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:40,749-Speed 3392.71 samples/sec Loss 7.4566 LearningRate 0.0666 Epoch: 3 Global Step: 45660 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:40:43,774-Speed 3386.35 samples/sec Loss 7.4990 LearningRate 0.0666 Epoch: 3 Global Step: 45670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:40:46,824-Speed 3358.66 samples/sec Loss 7.4767 LearningRate 0.0666 Epoch: 3 Global Step: 45680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:40:49,893-Speed 3336.86 samples/sec Loss 7.6224 LearningRate 0.0666 Epoch: 3 Global Step: 45690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:40:52,950-Speed 3351.31 samples/sec Loss 7.4921 LearningRate 0.0666 Epoch: 3 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:40:55,992-Speed 3367.11 samples/sec Loss 7.5759 LearningRate 0.0666 Epoch: 3 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:40:59,007-Speed 3397.62 samples/sec Loss 7.6152 LearningRate 0.0666 Epoch: 3 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:41:02,086-Speed 3326.88 samples/sec Loss 7.4919 LearningRate 0.0666 Epoch: 3 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:41:05,161-Speed 3331.13 samples/sec Loss 7.6104 LearningRate 0.0666 Epoch: 3 Global Step: 45740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:41:08,222-Speed 3346.29 samples/sec Loss 7.5572 LearningRate 0.0666 Epoch: 3 Global Step: 45750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:41:11,255-Speed 3376.81 samples/sec Loss 7.6105 LearningRate 0.0666 Epoch: 3 Global Step: 45760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:41:14,293-Speed 3371.84 samples/sec Loss 7.5838 LearningRate 0.0665 Epoch: 3 Global Step: 45770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:41:17,361-Speed 3338.88 samples/sec Loss 7.6122 LearningRate 0.0665 Epoch: 3 Global Step: 45780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:41:20,417-Speed 3351.32 samples/sec Loss 7.6048 LearningRate 0.0665 Epoch: 3 Global Step: 45790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:41:23,438-Speed 3390.66 samples/sec Loss 7.4369 LearningRate 0.0665 Epoch: 3 Global Step: 45800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:41:26,502-Speed 3343.79 samples/sec Loss 7.4421 LearningRate 0.0665 Epoch: 3 Global Step: 45810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:41:29,536-Speed 3375.41 samples/sec Loss 7.5793 LearningRate 0.0665 Epoch: 3 Global Step: 45820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:41:32,573-Speed 3372.91 samples/sec Loss 7.4589 LearningRate 0.0665 Epoch: 3 Global Step: 45830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:41:35,603-Speed 3380.76 samples/sec Loss 7.6785 LearningRate 0.0665 Epoch: 3 Global Step: 45840 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:41:38,685-Speed 3324.11 samples/sec Loss 7.6255 LearningRate 0.0665 Epoch: 3 Global Step: 45850 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:41:41,704-Speed 3393.11 samples/sec Loss 7.5653 LearningRate 0.0665 Epoch: 3 Global Step: 45860 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:41:44,714-Speed 3403.32 samples/sec Loss 7.6254 LearningRate 0.0665 Epoch: 3 Global Step: 45870 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:41:47,739-Speed 3386.28 samples/sec Loss 7.4980 LearningRate 0.0665 Epoch: 3 Global Step: 45880 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:41:50,799-Speed 3347.69 samples/sec Loss 7.6885 LearningRate 0.0665 Epoch: 3 Global Step: 45890 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:41:53,905-Speed 3298.62 samples/sec Loss 7.4465 LearningRate 0.0665 Epoch: 3 Global Step: 45900 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:41:56,939-Speed 3375.70 samples/sec Loss 7.4577 LearningRate 0.0665 Epoch: 3 Global Step: 45910 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:42:00,028-Speed 3315.82 samples/sec Loss 7.4545 LearningRate 0.0664 Epoch: 3 Global Step: 45920 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:42:03,098-Speed 3336.81 samples/sec Loss 7.4130 LearningRate 0.0664 Epoch: 3 Global Step: 45930 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:42:06,138-Speed 3369.16 samples/sec Loss 7.4811 LearningRate 0.0664 Epoch: 3 Global Step: 45940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:09,165-Speed 3384.70 samples/sec Loss 7.5252 LearningRate 0.0664 Epoch: 3 Global Step: 45950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:12,234-Speed 3336.86 samples/sec Loss 7.4907 LearningRate 0.0664 Epoch: 3 Global Step: 45960 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:15,351-Speed 3286.28 samples/sec Loss 7.5365 LearningRate 0.0664 Epoch: 3 Global Step: 45970 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:18,412-Speed 3347.10 samples/sec Loss 7.5359 LearningRate 0.0664 Epoch: 3 Global Step: 45980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:21,434-Speed 3389.81 samples/sec Loss 7.5313 LearningRate 0.0664 Epoch: 3 Global Step: 45990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:24,482-Speed 3360.28 samples/sec Loss 7.4928 LearningRate 0.0664 Epoch: 3 Global Step: 46000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:27,574-Speed 3313.61 samples/sec Loss 7.3824 LearningRate 0.0664 Epoch: 3 Global Step: 46010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:30,632-Speed 3348.52 samples/sec Loss 7.5834 LearningRate 0.0664 Epoch: 3 Global Step: 46020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:33,658-Speed 3385.36 samples/sec Loss 7.4849 LearningRate 0.0664 Epoch: 3 Global Step: 46030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:36,682-Speed 3387.66 samples/sec Loss 7.6149 LearningRate 0.0664 Epoch: 3 Global Step: 46040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:39,770-Speed 3316.87 samples/sec Loss 7.5191 LearningRate 0.0664 Epoch: 3 Global Step: 46050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:42,830-Speed 3347.61 samples/sec Loss 7.4204 LearningRate 0.0664 Epoch: 3 Global Step: 46060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:45,868-Speed 3372.23 samples/sec Loss 7.5014 LearningRate 0.0663 Epoch: 3 Global Step: 46070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:48,916-Speed 3360.00 samples/sec Loss 7.6289 LearningRate 0.0663 Epoch: 3 Global Step: 46080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:51,967-Speed 3358.13 samples/sec Loss 7.3887 LearningRate 0.0663 Epoch: 3 Global Step: 46090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:55,057-Speed 3314.87 samples/sec Loss 7.4761 LearningRate 0.0663 Epoch: 3 Global Step: 46100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:42:58,061-Speed 3409.33 samples/sec Loss 7.5146 LearningRate 0.0663 Epoch: 3 Global Step: 46110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:01,121-Speed 3347.84 samples/sec Loss 7.6252 LearningRate 0.0663 Epoch: 3 Global Step: 46120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:04,183-Speed 3345.22 samples/sec Loss 7.5904 LearningRate 0.0663 Epoch: 3 Global Step: 46130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:07,250-Speed 3340.75 samples/sec Loss 7.5459 LearningRate 0.0663 Epoch: 3 Global Step: 46140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:43:10,254-Speed 3409.43 samples/sec Loss 7.4984 LearningRate 0.0663 Epoch: 3 Global Step: 46150 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:13,282-Speed 3382.92 samples/sec Loss 7.4686 LearningRate 0.0663 Epoch: 3 Global Step: 46160 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:16,366-Speed 3321.56 samples/sec Loss 7.5442 LearningRate 0.0663 Epoch: 3 Global Step: 46170 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:19,424-Speed 3349.20 samples/sec Loss 7.4716 LearningRate 0.0663 Epoch: 3 Global Step: 46180 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:22,501-Speed 3329.13 samples/sec Loss 7.5313 LearningRate 0.0663 Epoch: 3 Global Step: 46190 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:25,568-Speed 3340.20 samples/sec Loss 7.5500 LearningRate 0.0663 Epoch: 3 Global Step: 46200 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:28,615-Speed 3361.81 samples/sec Loss 7.7120 LearningRate 0.0663 Epoch: 3 Global Step: 46210 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:31,692-Speed 3328.66 samples/sec Loss 7.5266 LearningRate 0.0663 Epoch: 3 Global Step: 46220 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:34,740-Speed 3360.60 samples/sec Loss 7.6095 LearningRate 0.0662 Epoch: 3 Global Step: 46230 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:37,759-Speed 3393.18 samples/sec Loss 7.5777 LearningRate 0.0662 Epoch: 3 Global Step: 46240 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:40,834-Speed 3330.44 samples/sec Loss 7.5033 LearningRate 0.0662 Epoch: 3 Global Step: 46250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:43:43,910-Speed 3330.25 samples/sec Loss 7.3553 LearningRate 0.0662 Epoch: 3 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:43:46,914-Speed 3410.37 samples/sec Loss 7.3906 LearningRate 0.0662 Epoch: 3 Global Step: 46270 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:49,971-Speed 3351.32 samples/sec Loss 7.5657 LearningRate 0.0662 Epoch: 3 Global Step: 46280 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:53,017-Speed 3362.58 samples/sec Loss 7.5373 LearningRate 0.0662 Epoch: 3 Global Step: 46290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:56,050-Speed 3376.82 samples/sec Loss 7.4441 LearningRate 0.0662 Epoch: 3 Global Step: 46300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:43:59,134-Speed 3322.03 samples/sec Loss 7.4851 LearningRate 0.0662 Epoch: 3 Global Step: 46310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:44:02,190-Speed 3351.25 samples/sec Loss 7.5670 LearningRate 0.0662 Epoch: 3 Global Step: 46320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:44:05,263-Speed 3334.10 samples/sec Loss 7.5673 LearningRate 0.0662 Epoch: 3 Global Step: 46330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:44:08,284-Speed 3390.36 samples/sec Loss 7.4080 LearningRate 0.0662 Epoch: 3 Global Step: 46340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:44:11,341-Speed 3350.18 samples/sec Loss 7.4939 LearningRate 0.0662 Epoch: 3 Global Step: 46350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:44:14,440-Speed 3305.61 samples/sec Loss 7.5126 LearningRate 0.0662 Epoch: 3 Global Step: 46360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:44:17,472-Speed 3379.35 samples/sec Loss 7.5424 LearningRate 0.0662 Epoch: 3 Global Step: 46370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:44:20,496-Speed 3386.65 samples/sec Loss 7.5686 LearningRate 0.0661 Epoch: 3 Global Step: 46380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:44:23,525-Speed 3382.17 samples/sec Loss 7.6213 LearningRate 0.0661 Epoch: 3 Global Step: 46390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:44:26,635-Speed 3293.15 samples/sec Loss 7.4619 LearningRate 0.0661 Epoch: 3 Global Step: 46400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:44:29,653-Speed 3394.29 samples/sec Loss 7.4423 LearningRate 0.0661 Epoch: 3 Global Step: 46410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:44:32,723-Speed 3336.71 samples/sec Loss 7.4905 LearningRate 0.0661 Epoch: 3 Global Step: 46420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:44:35,777-Speed 3353.65 samples/sec Loss 7.4911 LearningRate 0.0661 Epoch: 3 Global Step: 46430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:44:38,824-Speed 3361.61 samples/sec Loss 7.5681 LearningRate 0.0661 Epoch: 3 Global Step: 46440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:44:41,904-Speed 3325.69 samples/sec Loss 7.5733 LearningRate 0.0661 Epoch: 3 Global Step: 46450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:44:44,929-Speed 3386.00 samples/sec Loss 7.5489 LearningRate 0.0661 Epoch: 3 Global Step: 46460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:44:47,977-Speed 3361.27 samples/sec Loss 7.4543 LearningRate 0.0661 Epoch: 3 Global Step: 46470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:44:51,037-Speed 3347.03 samples/sec Loss 7.4479 LearningRate 0.0661 Epoch: 3 Global Step: 46480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:44:54,070-Speed 3377.70 samples/sec Loss 7.4548 LearningRate 0.0661 Epoch: 3 Global Step: 46490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:44:57,094-Speed 3386.55 samples/sec Loss 7.4717 LearningRate 0.0661 Epoch: 3 Global Step: 46500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:00,159-Speed 3342.09 samples/sec Loss 7.4274 LearningRate 0.0661 Epoch: 3 Global Step: 46510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:03,189-Speed 3380.83 samples/sec Loss 7.6280 LearningRate 0.0661 Epoch: 3 Global Step: 46520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:06,226-Speed 3372.86 samples/sec Loss 7.3762 LearningRate 0.0660 Epoch: 3 Global Step: 46530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:09,246-Speed 3392.14 samples/sec Loss 7.5075 LearningRate 0.0660 Epoch: 3 Global Step: 46540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:12,299-Speed 3354.58 samples/sec Loss 7.4923 LearningRate 0.0660 Epoch: 3 Global Step: 46550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:15,329-Speed 3381.20 samples/sec Loss 7.3992 LearningRate 0.0660 Epoch: 3 Global Step: 46560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:18,343-Speed 3398.06 samples/sec Loss 7.6081 LearningRate 0.0660 Epoch: 3 Global Step: 46570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:21,383-Speed 3369.12 samples/sec Loss 7.4350 LearningRate 0.0660 Epoch: 3 Global Step: 46580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:24,411-Speed 3383.17 samples/sec Loss 7.4141 LearningRate 0.0660 Epoch: 3 Global Step: 46590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:27,495-Speed 3321.06 samples/sec Loss 7.5602 LearningRate 0.0660 Epoch: 3 Global Step: 46600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:30,532-Speed 3372.87 samples/sec Loss 7.4996 LearningRate 0.0660 Epoch: 3 Global Step: 46610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:33,544-Speed 3400.54 samples/sec Loss 7.5050 LearningRate 0.0660 Epoch: 3 Global Step: 46620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:36,622-Speed 3328.67 samples/sec Loss 7.3801 LearningRate 0.0660 Epoch: 3 Global Step: 46630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:45:39,611-Speed 3426.83 samples/sec Loss 7.5661 LearningRate 0.0660 Epoch: 3 Global Step: 46640 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:45:42,658-Speed 3362.38 samples/sec Loss 7.4166 LearningRate 0.0660 Epoch: 3 Global Step: 46650 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:45:45,678-Speed 3391.97 samples/sec Loss 7.4605 LearningRate 0.0660 Epoch: 3 Global Step: 46660 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:45:48,686-Speed 3404.99 samples/sec Loss 7.4395 LearningRate 0.0660 Epoch: 3 Global Step: 46670 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:45:51,703-Speed 3395.71 samples/sec Loss 7.5638 LearningRate 0.0659 Epoch: 3 Global Step: 46680 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:45:54,770-Speed 3339.40 samples/sec Loss 7.4883 LearningRate 0.0659 Epoch: 3 Global Step: 46690 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:45:57,778-Speed 3405.99 samples/sec Loss 7.5005 LearningRate 0.0659 Epoch: 3 Global Step: 46700 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:46:00,864-Speed 3319.29 samples/sec Loss 7.6275 LearningRate 0.0659 Epoch: 3 Global Step: 46710 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:46:03,917-Speed 3354.73 samples/sec Loss 7.4946 LearningRate 0.0659 Epoch: 3 Global Step: 46720 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:46:06,969-Speed 3356.46 samples/sec Loss 7.4220 LearningRate 0.0659 Epoch: 3 Global Step: 46730 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:46:09,961-Speed 3424.32 samples/sec Loss 7.5503 LearningRate 0.0659 Epoch: 3 Global Step: 46740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:46:12,983-Speed 3388.89 samples/sec Loss 7.4860 LearningRate 0.0659 Epoch: 3 Global Step: 46750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:46:16,043-Speed 3347.83 samples/sec Loss 7.4632 LearningRate 0.0659 Epoch: 3 Global Step: 46760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:46:19,050-Speed 3406.94 samples/sec Loss 7.4792 LearningRate 0.0659 Epoch: 3 Global Step: 46770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:46:22,053-Speed 3410.84 samples/sec Loss 7.5485 LearningRate 0.0659 Epoch: 3 Global Step: 46780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:46:25,087-Speed 3375.88 samples/sec Loss 7.5574 LearningRate 0.0659 Epoch: 3 Global Step: 46790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:46:28,193-Speed 3298.02 samples/sec Loss 7.4669 LearningRate 0.0659 Epoch: 3 Global Step: 46800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:46:31,326-Speed 3269.26 samples/sec Loss 7.5399 LearningRate 0.0659 Epoch: 3 Global Step: 46810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:46:34,351-Speed 3386.47 samples/sec Loss 7.4726 LearningRate 0.0659 Epoch: 3 Global Step: 46820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:46:37,404-Speed 3354.63 samples/sec Loss 7.5804 LearningRate 0.0659 Epoch: 3 Global Step: 46830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:46:40,406-Speed 3412.70 samples/sec Loss 7.3402 LearningRate 0.0658 Epoch: 3 Global Step: 46840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:46:43,441-Speed 3374.97 samples/sec Loss 7.4205 LearningRate 0.0658 Epoch: 3 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:46:46,438-Speed 3418.06 samples/sec Loss 7.5271 LearningRate 0.0658 Epoch: 3 Global Step: 46860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:46:49,435-Speed 3417.43 samples/sec Loss 7.5574 LearningRate 0.0658 Epoch: 3 Global Step: 46870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:46:52,493-Speed 3350.08 samples/sec Loss 7.5300 LearningRate 0.0658 Epoch: 3 Global Step: 46880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:46:55,555-Speed 3346.04 samples/sec Loss 7.4945 LearningRate 0.0658 Epoch: 3 Global Step: 46890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:46:58,587-Speed 3378.11 samples/sec Loss 7.4286 LearningRate 0.0658 Epoch: 3 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:47:01,608-Speed 3390.59 samples/sec Loss 7.5402 LearningRate 0.0658 Epoch: 3 Global Step: 46910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:47:04,649-Speed 3368.60 samples/sec Loss 7.6316 LearningRate 0.0658 Epoch: 3 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:47:07,663-Speed 3398.98 samples/sec Loss 7.5075 LearningRate 0.0658 Epoch: 3 Global Step: 46930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:47:10,701-Speed 3371.45 samples/sec Loss 7.5510 LearningRate 0.0658 Epoch: 3 Global Step: 46940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:47:13,718-Speed 3395.74 samples/sec Loss 7.6206 LearningRate 0.0658 Epoch: 3 Global Step: 46950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:47:16,721-Speed 3410.61 samples/sec Loss 7.5706 LearningRate 0.0658 Epoch: 3 Global Step: 46960 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:47:19,725-Speed 3409.33 samples/sec Loss 7.3739 LearningRate 0.0658 Epoch: 3 Global Step: 46970 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:47:22,751-Speed 3385.88 samples/sec Loss 7.5500 LearningRate 0.0658 Epoch: 3 Global Step: 46980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:47:25,807-Speed 3351.82 samples/sec Loss 7.4249 LearningRate 0.0657 Epoch: 3 Global Step: 46990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:47:28,815-Speed 3405.60 samples/sec Loss 7.4360 LearningRate 0.0657 Epoch: 3 Global Step: 47000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:47:31,920-Speed 3299.77 samples/sec Loss 7.6589 LearningRate 0.0657 Epoch: 3 Global Step: 47010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:47:34,949-Speed 3382.03 samples/sec Loss 7.5777 LearningRate 0.0657 Epoch: 3 Global Step: 47020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:47:38,017-Speed 3338.29 samples/sec Loss 7.3925 LearningRate 0.0657 Epoch: 3 Global Step: 47030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:47:41,055-Speed 3372.43 samples/sec Loss 7.4899 LearningRate 0.0657 Epoch: 3 Global Step: 47040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:47:44,098-Speed 3365.47 samples/sec Loss 7.4523 LearningRate 0.0657 Epoch: 3 Global Step: 47050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:47:47,143-Speed 3363.70 samples/sec Loss 7.4796 LearningRate 0.0657 Epoch: 3 Global Step: 47060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:47:50,190-Speed 3361.95 samples/sec Loss 7.5208 LearningRate 0.0657 Epoch: 3 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:47:53,228-Speed 3372.28 samples/sec Loss 7.3690 LearningRate 0.0657 Epoch: 3 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:47:56,235-Speed 3405.62 samples/sec Loss 7.5296 LearningRate 0.0657 Epoch: 3 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:47:59,258-Speed 3389.33 samples/sec Loss 7.6264 LearningRate 0.0657 Epoch: 3 Global Step: 47100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:48:02,353-Speed 3309.26 samples/sec Loss 7.4891 LearningRate 0.0657 Epoch: 3 Global Step: 47110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:48:05,347-Speed 3421.83 samples/sec Loss 7.3957 LearningRate 0.0657 Epoch: 3 Global Step: 47120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:08,389-Speed 3366.57 samples/sec Loss 7.4681 LearningRate 0.0657 Epoch: 3 Global Step: 47130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:11,438-Speed 3360.49 samples/sec Loss 7.5409 LearningRate 0.0656 Epoch: 3 Global Step: 47140 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:14,514-Speed 3329.64 samples/sec Loss 7.4363 LearningRate 0.0656 Epoch: 3 Global Step: 47150 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:17,619-Speed 3298.92 samples/sec Loss 7.4564 LearningRate 0.0656 Epoch: 3 Global Step: 47160 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:20,670-Speed 3357.91 samples/sec Loss 7.5715 LearningRate 0.0656 Epoch: 3 Global Step: 47170 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:23,765-Speed 3309.52 samples/sec Loss 7.5233 LearningRate 0.0656 Epoch: 3 Global Step: 47180 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:26,824-Speed 3348.77 samples/sec Loss 7.5094 LearningRate 0.0656 Epoch: 3 Global Step: 47190 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:29,918-Speed 3310.94 samples/sec Loss 7.4545 LearningRate 0.0656 Epoch: 3 Global Step: 47200 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:32,934-Speed 3396.27 samples/sec Loss 7.3134 LearningRate 0.0656 Epoch: 3 Global Step: 47210 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:35,987-Speed 3355.26 samples/sec Loss 7.4936 LearningRate 0.0656 Epoch: 3 Global Step: 47220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:48:39,055-Speed 3338.41 samples/sec Loss 7.4490 LearningRate 0.0656 Epoch: 3 Global Step: 47230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:48:42,173-Speed 3285.25 samples/sec Loss 7.5536 LearningRate 0.0656 Epoch: 3 Global Step: 47240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:48:45,202-Speed 3382.23 samples/sec Loss 7.5702 LearningRate 0.0656 Epoch: 3 Global Step: 47250 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:48,273-Speed 3335.68 samples/sec Loss 7.4725 LearningRate 0.0656 Epoch: 3 Global Step: 47260 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:51,443-Speed 3230.90 samples/sec Loss 7.5166 LearningRate 0.0656 Epoch: 3 Global Step: 47270 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:54,454-Speed 3402.03 samples/sec Loss 7.4793 LearningRate 0.0656 Epoch: 3 Global Step: 47280 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:48:57,456-Speed 3411.97 samples/sec Loss 7.4064 LearningRate 0.0656 Epoch: 3 Global Step: 47290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:49:00,549-Speed 3312.77 samples/sec Loss 7.5076 LearningRate 0.0655 Epoch: 3 Global Step: 47300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:49:03,556-Speed 3406.26 samples/sec Loss 7.4258 LearningRate 0.0655 Epoch: 3 Global Step: 47310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:49:06,584-Speed 3383.16 samples/sec Loss 7.4374 LearningRate 0.0655 Epoch: 3 Global Step: 47320 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:49:09,639-Speed 3353.30 samples/sec Loss 7.4203 LearningRate 0.0655 Epoch: 3 Global Step: 47330 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:49:12,750-Speed 3291.86 samples/sec Loss 7.3995 LearningRate 0.0655 Epoch: 3 Global Step: 47340 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:49:15,835-Speed 3321.29 samples/sec Loss 7.3837 LearningRate 0.0655 Epoch: 3 Global Step: 47350 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:49:18,929-Speed 3310.54 samples/sec Loss 7.5267 LearningRate 0.0655 Epoch: 3 Global Step: 47360 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:49:21,938-Speed 3403.77 samples/sec Loss 7.5471 LearningRate 0.0655 Epoch: 3 Global Step: 47370 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:49:24,943-Speed 3409.33 samples/sec Loss 7.3864 LearningRate 0.0655 Epoch: 3 Global Step: 47380 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:49:27,959-Speed 3396.00 samples/sec Loss 7.5132 LearningRate 0.0655 Epoch: 3 Global Step: 47390 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:49:31,011-Speed 3356.85 samples/sec Loss 7.4237 LearningRate 0.0655 Epoch: 3 Global Step: 47400 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:49:34,037-Speed 3384.08 samples/sec Loss 7.5536 LearningRate 0.0655 Epoch: 3 Global Step: 47410 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:49:37,053-Speed 3397.32 samples/sec Loss 7.5215 LearningRate 0.0655 Epoch: 3 Global Step: 47420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:49:40,105-Speed 3355.99 samples/sec Loss 7.4887 LearningRate 0.0655 Epoch: 3 Global Step: 47430 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:49:43,121-Speed 3396.52 samples/sec Loss 7.4057 LearningRate 0.0655 Epoch: 3 Global Step: 47440 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:49:46,144-Speed 3387.75 samples/sec Loss 7.4331 LearningRate 0.0654 Epoch: 3 Global Step: 47450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:49:49,229-Speed 3320.55 samples/sec Loss 7.3925 LearningRate 0.0654 Epoch: 3 Global Step: 47460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:49:52,336-Speed 3297.31 samples/sec Loss 7.5261 LearningRate 0.0654 Epoch: 3 Global Step: 47470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:49:55,383-Speed 3361.52 samples/sec Loss 7.4637 LearningRate 0.0654 Epoch: 3 Global Step: 47480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:49:58,409-Speed 3385.62 samples/sec Loss 7.5226 LearningRate 0.0654 Epoch: 3 Global Step: 47490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:50:01,463-Speed 3353.61 samples/sec Loss 7.4592 LearningRate 0.0654 Epoch: 3 Global Step: 47500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:50:04,511-Speed 3360.77 samples/sec Loss 7.3833 LearningRate 0.0654 Epoch: 3 Global Step: 47510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:50:07,587-Speed 3330.38 samples/sec Loss 7.5192 LearningRate 0.0654 Epoch: 3 Global Step: 47520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:50:10,674-Speed 3317.96 samples/sec Loss 7.4152 LearningRate 0.0654 Epoch: 3 Global Step: 47530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:50:13,728-Speed 3353.86 samples/sec Loss 7.4165 LearningRate 0.0654 Epoch: 3 Global Step: 47540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:50:16,834-Speed 3298.14 samples/sec Loss 7.4353 LearningRate 0.0654 Epoch: 3 Global Step: 47550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:50:19,850-Speed 3395.78 samples/sec Loss 7.4023 LearningRate 0.0654 Epoch: 3 Global Step: 47560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:50:22,889-Speed 3371.10 samples/sec Loss 7.4661 LearningRate 0.0654 Epoch: 3 Global Step: 47570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:50:25,952-Speed 3343.69 samples/sec Loss 7.5706 LearningRate 0.0654 Epoch: 3 Global Step: 47580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:50:28,973-Speed 3391.22 samples/sec Loss 7.5270 LearningRate 0.0654 Epoch: 3 Global Step: 47590 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:50:32,804-Speed 2673.50 samples/sec Loss 7.5108 LearningRate 0.0653 Epoch: 3 Global Step: 47600 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:50:35,849-Speed 3364.05 samples/sec Loss 7.4939 LearningRate 0.0653 Epoch: 3 Global Step: 47610 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:50:38,904-Speed 3351.76 samples/sec Loss 7.4908 LearningRate 0.0653 Epoch: 3 Global Step: 47620 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:50:42,006-Speed 3302.54 samples/sec Loss 7.4779 LearningRate 0.0653 Epoch: 3 Global Step: 47630 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:50:45,015-Speed 3403.84 samples/sec Loss 7.4330 LearningRate 0.0653 Epoch: 3 Global Step: 47640 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:50:48,086-Speed 3336.15 samples/sec Loss 7.4875 LearningRate 0.0653 Epoch: 3 Global Step: 47650 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:50:51,172-Speed 3318.78 samples/sec Loss 7.3955 LearningRate 0.0653 Epoch: 3 Global Step: 47660 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:50:54,210-Speed 3372.66 samples/sec Loss 7.4335 LearningRate 0.0653 Epoch: 3 Global Step: 47670 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:50:57,232-Speed 3388.69 samples/sec Loss 7.3667 LearningRate 0.0653 Epoch: 3 Global Step: 47680 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:51:00,309-Speed 3329.36 samples/sec Loss 7.4004 LearningRate 0.0653 Epoch: 3 Global Step: 47690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:03,345-Speed 3374.76 samples/sec Loss 7.5087 LearningRate 0.0653 Epoch: 3 Global Step: 47700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:06,364-Speed 3393.09 samples/sec Loss 7.5010 LearningRate 0.0653 Epoch: 3 Global Step: 47710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:09,369-Speed 3408.24 samples/sec Loss 7.2706 LearningRate 0.0653 Epoch: 3 Global Step: 47720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:12,407-Speed 3371.52 samples/sec Loss 7.4533 LearningRate 0.0653 Epoch: 3 Global Step: 47730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:15,540-Speed 3269.15 samples/sec Loss 7.4379 LearningRate 0.0653 Epoch: 3 Global Step: 47740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:18,644-Speed 3300.59 samples/sec Loss 7.2543 LearningRate 0.0653 Epoch: 3 Global Step: 47750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:21,678-Speed 3375.94 samples/sec Loss 7.3989 LearningRate 0.0652 Epoch: 3 Global Step: 47760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:24,715-Speed 3372.49 samples/sec Loss 7.3722 LearningRate 0.0652 Epoch: 3 Global Step: 47770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:27,760-Speed 3364.52 samples/sec Loss 7.3959 LearningRate 0.0652 Epoch: 3 Global Step: 47780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:30,812-Speed 3355.44 samples/sec Loss 7.4864 LearningRate 0.0652 Epoch: 3 Global Step: 47790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:51:33,849-Speed 3373.42 samples/sec Loss 7.3900 LearningRate 0.0652 Epoch: 3 Global Step: 47800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:36,927-Speed 3327.30 samples/sec Loss 7.5271 LearningRate 0.0652 Epoch: 3 Global Step: 47810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:40,010-Speed 3322.57 samples/sec Loss 7.4526 LearningRate 0.0652 Epoch: 3 Global Step: 47820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:43,087-Speed 3329.20 samples/sec Loss 7.4784 LearningRate 0.0652 Epoch: 3 Global Step: 47830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:46,118-Speed 3379.26 samples/sec Loss 7.3459 LearningRate 0.0652 Epoch: 3 Global Step: 47840 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:49,191-Speed 3333.62 samples/sec Loss 7.5201 LearningRate 0.0652 Epoch: 3 Global Step: 47850 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:52,242-Speed 3357.06 samples/sec Loss 7.4380 LearningRate 0.0652 Epoch: 3 Global Step: 47860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:55,346-Speed 3300.19 samples/sec Loss 7.4205 LearningRate 0.0652 Epoch: 3 Global Step: 47870 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:51:58,392-Speed 3363.29 samples/sec Loss 7.4874 LearningRate 0.0652 Epoch: 3 Global Step: 47880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:01,489-Speed 3306.98 samples/sec Loss 7.3989 LearningRate 0.0652 Epoch: 3 Global Step: 47890 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:04,570-Speed 3325.72 samples/sec Loss 7.2857 LearningRate 0.0652 Epoch: 3 Global Step: 47900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:52:07,624-Speed 3353.36 samples/sec Loss 7.4489 LearningRate 0.0651 Epoch: 3 Global Step: 47910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:52:10,681-Speed 3350.97 samples/sec Loss 7.4036 LearningRate 0.0651 Epoch: 3 Global Step: 47920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:52:13,729-Speed 3360.60 samples/sec Loss 7.3616 LearningRate 0.0651 Epoch: 3 Global Step: 47930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:52:16,856-Speed 3276.68 samples/sec Loss 7.4354 LearningRate 0.0651 Epoch: 3 Global Step: 47940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:52:19,874-Speed 3393.25 samples/sec Loss 7.3643 LearningRate 0.0651 Epoch: 3 Global Step: 47950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:22,908-Speed 3377.18 samples/sec Loss 7.5316 LearningRate 0.0651 Epoch: 3 Global Step: 47960 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:25,932-Speed 3386.26 samples/sec Loss 7.5232 LearningRate 0.0651 Epoch: 3 Global Step: 47970 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:28,921-Speed 3427.56 samples/sec Loss 7.4714 LearningRate 0.0651 Epoch: 3 Global Step: 47980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:32,057-Speed 3266.55 samples/sec Loss 7.4952 LearningRate 0.0651 Epoch: 3 Global Step: 47990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:35,132-Speed 3330.93 samples/sec Loss 7.4841 LearningRate 0.0651 Epoch: 3 Global Step: 48000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:38,220-Speed 3316.59 samples/sec Loss 7.3958 LearningRate 0.0651 Epoch: 3 Global Step: 48010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:41,276-Speed 3351.53 samples/sec Loss 7.4213 LearningRate 0.0651 Epoch: 3 Global Step: 48020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:44,335-Speed 3349.77 samples/sec Loss 7.4138 LearningRate 0.0651 Epoch: 3 Global Step: 48030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:47,347-Speed 3400.15 samples/sec Loss 7.3709 LearningRate 0.0651 Epoch: 3 Global Step: 48040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:50,369-Speed 3390.60 samples/sec Loss 7.4594 LearningRate 0.0651 Epoch: 3 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:52:53,378-Speed 3403.99 samples/sec Loss 7.4280 LearningRate 0.0651 Epoch: 3 Global Step: 48060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:56,443-Speed 3342.04 samples/sec Loss 7.4830 LearningRate 0.0650 Epoch: 3 Global Step: 48070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:52:59,468-Speed 3386.62 samples/sec Loss 7.5154 LearningRate 0.0650 Epoch: 3 Global Step: 48080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:53:02,518-Speed 3357.52 samples/sec Loss 7.4415 LearningRate 0.0650 Epoch: 3 Global Step: 48090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:53:05,615-Speed 3308.11 samples/sec Loss 7.4808 LearningRate 0.0650 Epoch: 3 Global Step: 48100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:53:08,614-Speed 3415.29 samples/sec Loss 7.4165 LearningRate 0.0650 Epoch: 3 Global Step: 48110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:53:11,631-Speed 3395.08 samples/sec Loss 7.5033 LearningRate 0.0650 Epoch: 3 Global Step: 48120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:53:14,650-Speed 3393.14 samples/sec Loss 7.5491 LearningRate 0.0650 Epoch: 3 Global Step: 48130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:53:17,669-Speed 3392.67 samples/sec Loss 7.3772 LearningRate 0.0650 Epoch: 3 Global Step: 48140 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:53:20,716-Speed 3362.01 samples/sec Loss 7.4575 LearningRate 0.0650 Epoch: 3 Global Step: 48150 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:53:23,726-Speed 3402.99 samples/sec Loss 7.4596 LearningRate 0.0650 Epoch: 3 Global Step: 48160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:53:26,767-Speed 3368.82 samples/sec Loss 7.4429 LearningRate 0.0650 Epoch: 3 Global Step: 48170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:53:29,794-Speed 3383.56 samples/sec Loss 7.4392 LearningRate 0.0650 Epoch: 3 Global Step: 48180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:53:32,836-Speed 3367.52 samples/sec Loss 7.4037 LearningRate 0.0650 Epoch: 3 Global Step: 48190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:53:35,924-Speed 3317.45 samples/sec Loss 7.3755 LearningRate 0.0650 Epoch: 3 Global Step: 48200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:53:38,949-Speed 3386.06 samples/sec Loss 7.2926 LearningRate 0.0650 Epoch: 3 Global Step: 48210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:53:42,050-Speed 3302.65 samples/sec Loss 7.2513 LearningRate 0.0649 Epoch: 3 Global Step: 48220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:53:45,059-Speed 3404.01 samples/sec Loss 7.3887 LearningRate 0.0649 Epoch: 3 Global Step: 48230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:53:48,058-Speed 3416.10 samples/sec Loss 7.3108 LearningRate 0.0649 Epoch: 3 Global Step: 48240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:53:51,134-Speed 3329.64 samples/sec Loss 7.3798 LearningRate 0.0649 Epoch: 3 Global Step: 48250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:53:54,168-Speed 3376.42 samples/sec Loss 7.5587 LearningRate 0.0649 Epoch: 3 Global Step: 48260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 05:53:57,152-Speed 3433.73 samples/sec Loss 7.5372 LearningRate 0.0649 Epoch: 3 Global Step: 48270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:54:00,194-Speed 3366.99 samples/sec Loss 7.3866 LearningRate 0.0649 Epoch: 3 Global Step: 48280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:54:03,247-Speed 3355.50 samples/sec Loss 7.4244 LearningRate 0.0649 Epoch: 3 Global Step: 48290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:54:06,271-Speed 3386.65 samples/sec Loss 7.4535 LearningRate 0.0649 Epoch: 3 Global Step: 48300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:54:09,264-Speed 3423.28 samples/sec Loss 7.3902 LearningRate 0.0649 Epoch: 3 Global Step: 48310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:12,283-Speed 3392.78 samples/sec Loss 7.2679 LearningRate 0.0649 Epoch: 3 Global Step: 48320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:15,339-Speed 3352.18 samples/sec Loss 7.4595 LearningRate 0.0649 Epoch: 3 Global Step: 48330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:18,414-Speed 3330.64 samples/sec Loss 7.3711 LearningRate 0.0649 Epoch: 3 Global Step: 48340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:21,425-Speed 3402.55 samples/sec Loss 7.5681 LearningRate 0.0649 Epoch: 3 Global Step: 48350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:24,471-Speed 3362.63 samples/sec Loss 7.4723 LearningRate 0.0649 Epoch: 3 Global Step: 48360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:27,592-Speed 3281.73 samples/sec Loss 7.3366 LearningRate 0.0648 Epoch: 3 Global Step: 48370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:30,673-Speed 3324.89 samples/sec Loss 7.3838 LearningRate 0.0648 Epoch: 3 Global Step: 48380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:33,750-Speed 3328.35 samples/sec Loss 7.4549 LearningRate 0.0648 Epoch: 3 Global Step: 48390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:36,793-Speed 3366.46 samples/sec Loss 7.3573 LearningRate 0.0648 Epoch: 3 Global Step: 48400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:39,901-Speed 3296.08 samples/sec Loss 7.5862 LearningRate 0.0648 Epoch: 3 Global Step: 48410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:54:42,981-Speed 3325.77 samples/sec Loss 7.4326 LearningRate 0.0648 Epoch: 3 Global Step: 48420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:46,017-Speed 3373.70 samples/sec Loss 7.3540 LearningRate 0.0648 Epoch: 3 Global Step: 48430 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:49,046-Speed 3382.38 samples/sec Loss 7.4169 LearningRate 0.0648 Epoch: 3 Global Step: 48440 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:52,087-Speed 3367.93 samples/sec Loss 7.5426 LearningRate 0.0648 Epoch: 3 Global Step: 48450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:55,178-Speed 3314.02 samples/sec Loss 7.3504 LearningRate 0.0648 Epoch: 3 Global Step: 48460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:54:58,186-Speed 3405.28 samples/sec Loss 7.4606 LearningRate 0.0648 Epoch: 3 Global Step: 48470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:01,239-Speed 3354.53 samples/sec Loss 7.4635 LearningRate 0.0648 Epoch: 3 Global Step: 48480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:04,350-Speed 3291.99 samples/sec Loss 7.4591 LearningRate 0.0648 Epoch: 3 Global Step: 48490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:07,424-Speed 3332.86 samples/sec Loss 7.5023 LearningRate 0.0648 Epoch: 3 Global Step: 48500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:10,431-Speed 3406.01 samples/sec Loss 7.3890 LearningRate 0.0648 Epoch: 3 Global Step: 48510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:13,481-Speed 3359.23 samples/sec Loss 7.4511 LearningRate 0.0648 Epoch: 3 Global Step: 48520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:55:16,537-Speed 3351.55 samples/sec Loss 7.4751 LearningRate 0.0647 Epoch: 3 Global Step: 48530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:55:19,568-Speed 3379.23 samples/sec Loss 7.3730 LearningRate 0.0647 Epoch: 3 Global Step: 48540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:55:22,589-Speed 3391.16 samples/sec Loss 7.4218 LearningRate 0.0647 Epoch: 3 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:55:25,593-Speed 3410.38 samples/sec Loss 7.3004 LearningRate 0.0647 Epoch: 3 Global Step: 48560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:28,677-Speed 3320.56 samples/sec Loss 7.3180 LearningRate 0.0647 Epoch: 3 Global Step: 48570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:31,713-Speed 3374.40 samples/sec Loss 7.3011 LearningRate 0.0647 Epoch: 3 Global Step: 48580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:34,795-Speed 3323.05 samples/sec Loss 7.4239 LearningRate 0.0647 Epoch: 3 Global Step: 48590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:37,823-Speed 3382.73 samples/sec Loss 7.4916 LearningRate 0.0647 Epoch: 3 Global Step: 48600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:40,850-Speed 3384.75 samples/sec Loss 7.3122 LearningRate 0.0647 Epoch: 3 Global Step: 48610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:43,850-Speed 3414.60 samples/sec Loss 7.3590 LearningRate 0.0647 Epoch: 3 Global Step: 48620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:46,885-Speed 3374.68 samples/sec Loss 7.4737 LearningRate 0.0647 Epoch: 3 Global Step: 48630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:49,956-Speed 3335.65 samples/sec Loss 7.4382 LearningRate 0.0647 Epoch: 3 Global Step: 48640 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:52,973-Speed 3395.21 samples/sec Loss 7.4171 LearningRate 0.0647 Epoch: 3 Global Step: 48650 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:55:55,974-Speed 3413.23 samples/sec Loss 7.4903 LearningRate 0.0647 Epoch: 3 Global Step: 48660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:55:58,978-Speed 3410.24 samples/sec Loss 7.4166 LearningRate 0.0647 Epoch: 3 Global Step: 48670 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:02,027-Speed 3359.47 samples/sec Loss 7.3477 LearningRate 0.0646 Epoch: 3 Global Step: 48680 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:05,109-Speed 3322.92 samples/sec Loss 7.3723 LearningRate 0.0646 Epoch: 3 Global Step: 48690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:08,113-Speed 3410.62 samples/sec Loss 7.4307 LearningRate 0.0646 Epoch: 3 Global Step: 48700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:11,141-Speed 3382.47 samples/sec Loss 7.3845 LearningRate 0.0646 Epoch: 3 Global Step: 48710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:14,151-Speed 3402.87 samples/sec Loss 7.4202 LearningRate 0.0646 Epoch: 3 Global Step: 48720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:17,266-Speed 3288.99 samples/sec Loss 7.4201 LearningRate 0.0646 Epoch: 3 Global Step: 48730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:20,263-Speed 3418.41 samples/sec Loss 7.4225 LearningRate 0.0646 Epoch: 3 Global Step: 48740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:23,278-Speed 3396.69 samples/sec Loss 7.2835 LearningRate 0.0646 Epoch: 3 Global Step: 48750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:26,325-Speed 3362.61 samples/sec Loss 7.3988 LearningRate 0.0646 Epoch: 3 Global Step: 48760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:29,335-Speed 3402.87 samples/sec Loss 7.5096 LearningRate 0.0646 Epoch: 3 Global Step: 48770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:32,356-Speed 3391.01 samples/sec Loss 7.2370 LearningRate 0.0646 Epoch: 3 Global Step: 48780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:35,420-Speed 3342.78 samples/sec Loss 7.4892 LearningRate 0.0646 Epoch: 3 Global Step: 48790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:38,440-Speed 3392.16 samples/sec Loss 7.4266 LearningRate 0.0646 Epoch: 3 Global Step: 48800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:41,494-Speed 3354.10 samples/sec Loss 7.4052 LearningRate 0.0646 Epoch: 3 Global Step: 48810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:44,501-Speed 3406.42 samples/sec Loss 7.3151 LearningRate 0.0646 Epoch: 3 Global Step: 48820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:47,562-Speed 3346.64 samples/sec Loss 7.3970 LearningRate 0.0646 Epoch: 3 Global Step: 48830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:50,584-Speed 3390.00 samples/sec Loss 7.3762 LearningRate 0.0645 Epoch: 3 Global Step: 48840 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:53,604-Speed 3391.52 samples/sec Loss 7.3612 LearningRate 0.0645 Epoch: 3 Global Step: 48850 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:56,656-Speed 3356.02 samples/sec Loss 7.3618 LearningRate 0.0645 Epoch: 3 Global Step: 48860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:56:59,695-Speed 3370.69 samples/sec Loss 7.3846 LearningRate 0.0645 Epoch: 3 Global Step: 48870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:02,743-Speed 3360.76 samples/sec Loss 7.4775 LearningRate 0.0645 Epoch: 3 Global Step: 48880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:05,793-Speed 3357.52 samples/sec Loss 7.4702 LearningRate 0.0645 Epoch: 3 Global Step: 48890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:08,841-Speed 3361.09 samples/sec Loss 7.4669 LearningRate 0.0645 Epoch: 3 Global Step: 48900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:11,897-Speed 3351.72 samples/sec Loss 7.3834 LearningRate 0.0645 Epoch: 3 Global Step: 48910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:14,910-Speed 3400.34 samples/sec Loss 7.4417 LearningRate 0.0645 Epoch: 3 Global Step: 48920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:17,984-Speed 3332.07 samples/sec Loss 7.3321 LearningRate 0.0645 Epoch: 3 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:21,028-Speed 3364.96 samples/sec Loss 7.6278 LearningRate 0.0645 Epoch: 3 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:24,048-Speed 3392.05 samples/sec Loss 7.4511 LearningRate 0.0645 Epoch: 3 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:27,044-Speed 3419.63 samples/sec Loss 7.3968 LearningRate 0.0645 Epoch: 3 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:30,034-Speed 3426.07 samples/sec Loss 7.3695 LearningRate 0.0645 Epoch: 3 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:33,080-Speed 3362.27 samples/sec Loss 7.3505 LearningRate 0.0645 Epoch: 3 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:36,104-Speed 3387.08 samples/sec Loss 7.3195 LearningRate 0.0644 Epoch: 3 Global Step: 48990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:39,134-Speed 3380.54 samples/sec Loss 7.3902 LearningRate 0.0644 Epoch: 3 Global Step: 49000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:42,184-Speed 3358.39 samples/sec Loss 7.3653 LearningRate 0.0644 Epoch: 3 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:57:45,173-Speed 3427.99 samples/sec Loss 7.4076 LearningRate 0.0644 Epoch: 3 Global Step: 49020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:57:48,194-Speed 3390.23 samples/sec Loss 7.4403 LearningRate 0.0644 Epoch: 3 Global Step: 49030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:57:51,214-Speed 3391.94 samples/sec Loss 7.4941 LearningRate 0.0644 Epoch: 3 Global Step: 49040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:57:54,225-Speed 3401.60 samples/sec Loss 7.3684 LearningRate 0.0644 Epoch: 3 Global Step: 49050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:57:57,221-Speed 3419.50 samples/sec Loss 7.4788 LearningRate 0.0644 Epoch: 3 Global Step: 49060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:00,233-Speed 3399.78 samples/sec Loss 7.4506 LearningRate 0.0644 Epoch: 3 Global Step: 49070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:03,275-Speed 3367.82 samples/sec Loss 7.2488 LearningRate 0.0644 Epoch: 3 Global Step: 49080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:06,292-Speed 3395.53 samples/sec Loss 7.4489 LearningRate 0.0644 Epoch: 3 Global Step: 49090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:09,288-Speed 3419.13 samples/sec Loss 7.3925 LearningRate 0.0644 Epoch: 3 Global Step: 49100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:12,304-Speed 3395.79 samples/sec Loss 7.3568 LearningRate 0.0644 Epoch: 3 Global Step: 49110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:15,378-Speed 3332.74 samples/sec Loss 7.4013 LearningRate 0.0644 Epoch: 3 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:58:18,394-Speed 3395.45 samples/sec Loss 7.3805 LearningRate 0.0644 Epoch: 3 Global Step: 49130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:21,392-Speed 3416.80 samples/sec Loss 7.3923 LearningRate 0.0644 Epoch: 3 Global Step: 49140 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:24,399-Speed 3407.09 samples/sec Loss 7.4283 LearningRate 0.0643 Epoch: 3 Global Step: 49150 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:27,414-Speed 3397.48 samples/sec Loss 7.4036 LearningRate 0.0643 Epoch: 3 Global Step: 49160 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:30,450-Speed 3373.70 samples/sec Loss 7.5565 LearningRate 0.0643 Epoch: 3 Global Step: 49170 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:33,443-Speed 3422.06 samples/sec Loss 7.3895 LearningRate 0.0643 Epoch: 3 Global Step: 49180 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:36,444-Speed 3413.97 samples/sec Loss 7.4809 LearningRate 0.0643 Epoch: 3 Global Step: 49190 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:39,485-Speed 3368.03 samples/sec Loss 7.2959 LearningRate 0.0643 Epoch: 3 Global Step: 49200 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:42,565-Speed 3325.26 samples/sec Loss 7.4158 LearningRate 0.0643 Epoch: 3 Global Step: 49210 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:45,560-Speed 3420.82 samples/sec Loss 7.3708 LearningRate 0.0643 Epoch: 3 Global Step: 49220 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:58:48,575-Speed 3397.22 samples/sec Loss 7.3586 LearningRate 0.0643 Epoch: 3 Global Step: 49230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:58:51,609-Speed 3376.50 samples/sec Loss 7.2999 LearningRate 0.0643 Epoch: 3 Global Step: 49240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:58:54,635-Speed 3385.15 samples/sec Loss 7.4250 LearningRate 0.0643 Epoch: 3 Global Step: 49250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 05:58:57,620-Speed 3432.23 samples/sec Loss 7.5683 LearningRate 0.0643 Epoch: 3 Global Step: 49260 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:59:00,673-Speed 3354.76 samples/sec Loss 7.3563 LearningRate 0.0643 Epoch: 3 Global Step: 49270 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:59:03,756-Speed 3321.68 samples/sec Loss 7.4233 LearningRate 0.0643 Epoch: 3 Global Step: 49280 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:59:06,824-Speed 3338.97 samples/sec Loss 7.3809 LearningRate 0.0643 Epoch: 3 Global Step: 49290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:59:09,856-Speed 3378.91 samples/sec Loss 7.3679 LearningRate 0.0642 Epoch: 3 Global Step: 49300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:59:12,917-Speed 3346.28 samples/sec Loss 7.4707 LearningRate 0.0642 Epoch: 3 Global Step: 49310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:59:15,923-Speed 3407.76 samples/sec Loss 7.3077 LearningRate 0.0642 Epoch: 3 Global Step: 49320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:59:18,975-Speed 3355.76 samples/sec Loss 7.4431 LearningRate 0.0642 Epoch: 3 Global Step: 49330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:59:21,983-Speed 3405.54 samples/sec Loss 7.3820 LearningRate 0.0642 Epoch: 3 Global Step: 49340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:59:25,010-Speed 3383.38 samples/sec Loss 7.4001 LearningRate 0.0642 Epoch: 3 Global Step: 49350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 05:59:28,049-Speed 3371.06 samples/sec Loss 7.3114 LearningRate 0.0642 Epoch: 3 Global Step: 49360 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:59:31,079-Speed 3380.93 samples/sec Loss 7.4571 LearningRate 0.0642 Epoch: 3 Global Step: 49370 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:59:34,119-Speed 3369.36 samples/sec Loss 7.3457 LearningRate 0.0642 Epoch: 3 Global Step: 49380 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:59:37,177-Speed 3349.02 samples/sec Loss 7.4850 LearningRate 0.0642 Epoch: 3 Global Step: 49390 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:59:40,303-Speed 3277.05 samples/sec Loss 7.3271 LearningRate 0.0642 Epoch: 3 Global Step: 49400 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:59:43,357-Speed 3354.32 samples/sec Loss 7.3842 LearningRate 0.0642 Epoch: 3 Global Step: 49410 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:59:46,421-Speed 3342.88 samples/sec Loss 7.2858 LearningRate 0.0642 Epoch: 3 Global Step: 49420 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:59:49,532-Speed 3293.61 samples/sec Loss 7.3586 LearningRate 0.0642 Epoch: 3 Global Step: 49430 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:59:52,589-Speed 3350.92 samples/sec Loss 7.2788 LearningRate 0.0642 Epoch: 3 Global Step: 49440 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:59:55,647-Speed 3348.90 samples/sec Loss 7.4288 LearningRate 0.0642 Epoch: 3 Global Step: 49450 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 05:59:58,694-Speed 3362.04 samples/sec Loss 7.3514 LearningRate 0.0641 Epoch: 3 Global Step: 49460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:01,794-Speed 3304.37 samples/sec Loss 7.3069 LearningRate 0.0641 Epoch: 3 Global Step: 49470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:04,857-Speed 3344.10 samples/sec Loss 7.4046 LearningRate 0.0641 Epoch: 3 Global Step: 49480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:07,916-Speed 3349.46 samples/sec Loss 7.4376 LearningRate 0.0641 Epoch: 3 Global Step: 49490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:10,938-Speed 3389.55 samples/sec Loss 7.3511 LearningRate 0.0641 Epoch: 3 Global Step: 49500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:14,002-Speed 3343.17 samples/sec Loss 7.3578 LearningRate 0.0641 Epoch: 3 Global Step: 49510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:17,084-Speed 3323.52 samples/sec Loss 7.4514 LearningRate 0.0641 Epoch: 3 Global Step: 49520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:20,118-Speed 3376.12 samples/sec Loss 7.3770 LearningRate 0.0641 Epoch: 3 Global Step: 49530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:23,177-Speed 3348.48 samples/sec Loss 7.3169 LearningRate 0.0641 Epoch: 3 Global Step: 49540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:26,188-Speed 3401.59 samples/sec Loss 7.4222 LearningRate 0.0641 Epoch: 3 Global Step: 49550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:29,228-Speed 3369.35 samples/sec Loss 7.4572 LearningRate 0.0641 Epoch: 3 Global Step: 49560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:00:32,260-Speed 3378.89 samples/sec Loss 7.2746 LearningRate 0.0641 Epoch: 3 Global Step: 49570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:00:35,292-Speed 3378.76 samples/sec Loss 7.4437 LearningRate 0.0641 Epoch: 3 Global Step: 49580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:00:38,314-Speed 3389.42 samples/sec Loss 7.3857 LearningRate 0.0641 Epoch: 3 Global Step: 49590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:00:41,393-Speed 3326.61 samples/sec Loss 7.3480 LearningRate 0.0641 Epoch: 3 Global Step: 49600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:00:44,416-Speed 3388.88 samples/sec Loss 7.4069 LearningRate 0.0640 Epoch: 3 Global Step: 49610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:00:47,441-Speed 3386.49 samples/sec Loss 7.4283 LearningRate 0.0640 Epoch: 3 Global Step: 49620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:00:50,477-Speed 3373.66 samples/sec Loss 7.5518 LearningRate 0.0640 Epoch: 3 Global Step: 49630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:53,508-Speed 3380.09 samples/sec Loss 7.4119 LearningRate 0.0640 Epoch: 3 Global Step: 49640 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:56,528-Speed 3391.94 samples/sec Loss 7.3694 LearningRate 0.0640 Epoch: 3 Global Step: 49650 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:00:59,575-Speed 3361.10 samples/sec Loss 7.4503 LearningRate 0.0640 Epoch: 3 Global Step: 49660 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:01:02,647-Speed 3334.94 samples/sec Loss 7.2786 LearningRate 0.0640 Epoch: 3 Global Step: 49670 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:01:05,861-Speed 3186.82 samples/sec Loss 7.4540 LearningRate 0.0640 Epoch: 3 Global Step: 49680 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:01:37,183-Speed 326.94 samples/sec Loss 6.4911 LearningRate 0.0640 Epoch: 4 Global Step: 49690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:01:40,643-Speed 2961.18 samples/sec Loss 5.8580 LearningRate 0.0640 Epoch: 4 Global Step: 49700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:01:43,788-Speed 3256.70 samples/sec Loss 5.7370 LearningRate 0.0640 Epoch: 4 Global Step: 49710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:01:46,827-Speed 3371.01 samples/sec Loss 5.7527 LearningRate 0.0640 Epoch: 4 Global Step: 49720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:01:49,916-Speed 3316.11 samples/sec Loss 5.7446 LearningRate 0.0640 Epoch: 4 Global Step: 49730 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:01:53,023-Speed 3296.68 samples/sec Loss 5.7072 LearningRate 0.0640 Epoch: 4 Global Step: 49740 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:01:56,292-Speed 3133.05 samples/sec Loss 5.7311 LearningRate 0.0640 Epoch: 4 Global Step: 49750 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:01:59,608-Speed 3088.80 samples/sec Loss 5.6475 LearningRate 0.0640 Epoch: 4 Global Step: 49760 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:02:02,667-Speed 3349.05 samples/sec Loss 5.8023 LearningRate 0.0639 Epoch: 4 Global Step: 49770 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:02:06,426-Speed 2724.66 samples/sec Loss 5.8583 LearningRate 0.0639 Epoch: 4 Global Step: 49780 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:02:09,446-Speed 3392.20 samples/sec Loss 5.7017 LearningRate 0.0639 Epoch: 4 Global Step: 49790 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:02:12,533-Speed 3317.59 samples/sec Loss 5.7644 LearningRate 0.0639 Epoch: 4 Global Step: 49800 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:02:15,573-Speed 3369.29 samples/sec Loss 5.8430 LearningRate 0.0639 Epoch: 4 Global Step: 49810 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:02:18,634-Speed 3346.82 samples/sec Loss 5.7297 LearningRate 0.0639 Epoch: 4 Global Step: 49820 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:02:21,688-Speed 3353.89 samples/sec Loss 5.8352 LearningRate 0.0639 Epoch: 4 Global Step: 49830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:02:24,779-Speed 3314.61 samples/sec Loss 5.8415 LearningRate 0.0639 Epoch: 4 Global Step: 49840 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:02:27,826-Speed 3361.97 samples/sec Loss 5.7908 LearningRate 0.0639 Epoch: 4 Global Step: 49850 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:02:30,876-Speed 3357.53 samples/sec Loss 5.7888 LearningRate 0.0639 Epoch: 4 Global Step: 49860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:02:33,909-Speed 3378.16 samples/sec Loss 5.8243 LearningRate 0.0639 Epoch: 4 Global Step: 49870 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:02:37,084-Speed 3226.11 samples/sec Loss 5.9023 LearningRate 0.0639 Epoch: 4 Global Step: 49880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:02:40,126-Speed 3367.14 samples/sec Loss 5.8562 LearningRate 0.0639 Epoch: 4 Global Step: 49890 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:02:43,215-Speed 3315.30 samples/sec Loss 5.8236 LearningRate 0.0639 Epoch: 4 Global Step: 49900 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:02:46,294-Speed 3327.32 samples/sec Loss 5.7990 LearningRate 0.0639 Epoch: 4 Global Step: 49910 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:02:49,371-Speed 3329.39 samples/sec Loss 5.7256 LearningRate 0.0638 Epoch: 4 Global Step: 49920 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:02:52,427-Speed 3351.99 samples/sec Loss 5.7038 LearningRate 0.0638 Epoch: 4 Global Step: 49930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:02:55,477-Speed 3358.24 samples/sec Loss 5.8472 LearningRate 0.0638 Epoch: 4 Global Step: 49940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:02:58,533-Speed 3351.51 samples/sec Loss 5.6893 LearningRate 0.0638 Epoch: 4 Global Step: 49950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:01,582-Speed 3360.22 samples/sec Loss 5.7444 LearningRate 0.0638 Epoch: 4 Global Step: 49960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:04,660-Speed 3327.09 samples/sec Loss 5.8077 LearningRate 0.0638 Epoch: 4 Global Step: 49970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:07,681-Speed 3391.04 samples/sec Loss 5.8358 LearningRate 0.0638 Epoch: 4 Global Step: 49980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:10,723-Speed 3367.72 samples/sec Loss 5.9078 LearningRate 0.0638 Epoch: 4 Global Step: 49990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:13,769-Speed 3363.08 samples/sec Loss 5.8033 LearningRate 0.0638 Epoch: 4 Global Step: 50000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:16,805-Speed 3373.43 samples/sec Loss 5.7877 LearningRate 0.0638 Epoch: 4 Global Step: 50010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:19,863-Speed 3350.29 samples/sec Loss 5.9369 LearningRate 0.0638 Epoch: 4 Global Step: 50020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:22,897-Speed 3375.44 samples/sec Loss 5.8323 LearningRate 0.0638 Epoch: 4 Global Step: 50030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 06:03:25,907-Speed 3402.93 samples/sec Loss 5.8858 LearningRate 0.0638 Epoch: 4 Global Step: 50040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:28,945-Speed 3371.83 samples/sec Loss 6.0556 LearningRate 0.0638 Epoch: 4 Global Step: 50050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:31,976-Speed 3379.39 samples/sec Loss 5.9424 LearningRate 0.0638 Epoch: 4 Global Step: 50060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:35,007-Speed 3379.05 samples/sec Loss 5.8885 LearningRate 0.0638 Epoch: 4 Global Step: 50070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:38,069-Speed 3345.90 samples/sec Loss 5.9255 LearningRate 0.0637 Epoch: 4 Global Step: 50080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:42,967-Speed 2091.05 samples/sec Loss 5.8794 LearningRate 0.0637 Epoch: 4 Global Step: 50090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:45,987-Speed 3392.05 samples/sec Loss 5.9570 LearningRate 0.0637 Epoch: 4 Global Step: 50100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:49,027-Speed 3369.06 samples/sec Loss 5.8983 LearningRate 0.0637 Epoch: 4 Global Step: 50110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:52,081-Speed 3354.44 samples/sec Loss 5.8665 LearningRate 0.0637 Epoch: 4 Global Step: 50120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:55,106-Speed 3385.68 samples/sec Loss 5.7419 LearningRate 0.0637 Epoch: 4 Global Step: 50130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:03:58,134-Speed 3383.72 samples/sec Loss 5.8973 LearningRate 0.0637 Epoch: 4 Global Step: 50140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:04:01,211-Speed 3327.74 samples/sec Loss 6.0351 LearningRate 0.0637 Epoch: 4 Global Step: 50150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:04:04,245-Speed 3376.98 samples/sec Loss 5.9115 LearningRate 0.0637 Epoch: 4 Global Step: 50160 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:07,300-Speed 3353.88 samples/sec Loss 5.9294 LearningRate 0.0637 Epoch: 4 Global Step: 50170 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:10,308-Speed 3404.51 samples/sec Loss 5.9407 LearningRate 0.0637 Epoch: 4 Global Step: 50180 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:13,360-Speed 3356.77 samples/sec Loss 6.0514 LearningRate 0.0637 Epoch: 4 Global Step: 50190 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:16,381-Speed 3390.37 samples/sec Loss 5.9900 LearningRate 0.0637 Epoch: 4 Global Step: 50200 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:19,419-Speed 3371.52 samples/sec Loss 5.9036 LearningRate 0.0637 Epoch: 4 Global Step: 50210 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:22,448-Speed 3382.23 samples/sec Loss 5.9126 LearningRate 0.0637 Epoch: 4 Global Step: 50220 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:25,478-Speed 3380.45 samples/sec Loss 6.0146 LearningRate 0.0636 Epoch: 4 Global Step: 50230 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:28,532-Speed 3354.43 samples/sec Loss 6.0266 LearningRate 0.0636 Epoch: 4 Global Step: 50240 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:31,562-Speed 3380.52 samples/sec Loss 5.9863 LearningRate 0.0636 Epoch: 4 Global Step: 50250 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:34,567-Speed 3408.93 samples/sec Loss 6.0619 LearningRate 0.0636 Epoch: 4 Global Step: 50260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:04:37,610-Speed 3366.24 samples/sec Loss 5.9595 LearningRate 0.0636 Epoch: 4 Global Step: 50270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:04:40,684-Speed 3331.59 samples/sec Loss 5.9951 LearningRate 0.0636 Epoch: 4 Global Step: 50280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:04:43,719-Speed 3374.99 samples/sec Loss 6.1210 LearningRate 0.0636 Epoch: 4 Global Step: 50290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:46,749-Speed 3380.41 samples/sec Loss 5.9592 LearningRate 0.0636 Epoch: 4 Global Step: 50300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:49,784-Speed 3375.95 samples/sec Loss 5.9981 LearningRate 0.0636 Epoch: 4 Global Step: 50310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:52,824-Speed 3368.50 samples/sec Loss 6.0093 LearningRate 0.0636 Epoch: 4 Global Step: 50320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:55,862-Speed 3372.04 samples/sec Loss 6.0605 LearningRate 0.0636 Epoch: 4 Global Step: 50330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:04:58,863-Speed 3413.13 samples/sec Loss 6.0509 LearningRate 0.0636 Epoch: 4 Global Step: 50340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:01,944-Speed 3324.46 samples/sec Loss 6.1337 LearningRate 0.0636 Epoch: 4 Global Step: 50350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:05,025-Speed 3325.09 samples/sec Loss 5.8619 LearningRate 0.0636 Epoch: 4 Global Step: 50360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:08,086-Speed 3346.12 samples/sec Loss 6.0491 LearningRate 0.0636 Epoch: 4 Global Step: 50370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:11,105-Speed 3393.03 samples/sec Loss 6.0307 LearningRate 0.0636 Epoch: 4 Global Step: 50380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:14,117-Speed 3400.65 samples/sec Loss 6.0242 LearningRate 0.0635 Epoch: 4 Global Step: 50390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:05:17,195-Speed 3328.30 samples/sec Loss 6.0469 LearningRate 0.0635 Epoch: 4 Global Step: 50400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:05:20,240-Speed 3363.63 samples/sec Loss 5.9578 LearningRate 0.0635 Epoch: 4 Global Step: 50410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:05:23,277-Speed 3372.91 samples/sec Loss 6.0104 LearningRate 0.0635 Epoch: 4 Global Step: 50420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:05:26,321-Speed 3365.12 samples/sec Loss 6.0126 LearningRate 0.0635 Epoch: 4 Global Step: 50430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:05:29,390-Speed 3337.50 samples/sec Loss 5.9742 LearningRate 0.0635 Epoch: 4 Global Step: 50440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:05:32,412-Speed 3390.45 samples/sec Loss 6.0262 LearningRate 0.0635 Epoch: 4 Global Step: 50450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:05:35,446-Speed 3376.67 samples/sec Loss 6.0799 LearningRate 0.0635 Epoch: 4 Global Step: 50460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:38,463-Speed 3395.14 samples/sec Loss 6.0607 LearningRate 0.0635 Epoch: 4 Global Step: 50470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:41,517-Speed 3354.37 samples/sec Loss 6.1063 LearningRate 0.0635 Epoch: 4 Global Step: 50480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:44,577-Speed 3347.11 samples/sec Loss 6.2387 LearningRate 0.0635 Epoch: 4 Global Step: 50490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:47,617-Speed 3368.99 samples/sec Loss 6.1418 LearningRate 0.0635 Epoch: 4 Global Step: 50500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:50,649-Speed 3377.87 samples/sec Loss 6.0129 LearningRate 0.0635 Epoch: 4 Global Step: 50510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:53,696-Speed 3362.51 samples/sec Loss 6.0128 LearningRate 0.0635 Epoch: 4 Global Step: 50520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:56,705-Speed 3404.23 samples/sec Loss 6.2077 LearningRate 0.0635 Epoch: 4 Global Step: 50530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:05:59,732-Speed 3384.49 samples/sec Loss 6.1466 LearningRate 0.0634 Epoch: 4 Global Step: 50540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:02,783-Speed 3357.11 samples/sec Loss 6.0423 LearningRate 0.0634 Epoch: 4 Global Step: 50550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:05,800-Speed 3395.21 samples/sec Loss 6.0318 LearningRate 0.0634 Epoch: 4 Global Step: 50560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:06:08,818-Speed 3393.74 samples/sec Loss 6.1336 LearningRate 0.0634 Epoch: 4 Global Step: 50570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:06:11,837-Speed 3393.53 samples/sec Loss 6.0878 LearningRate 0.0634 Epoch: 4 Global Step: 50580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:14,919-Speed 3323.39 samples/sec Loss 6.0995 LearningRate 0.0634 Epoch: 4 Global Step: 50590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:17,970-Speed 3357.12 samples/sec Loss 6.0726 LearningRate 0.0634 Epoch: 4 Global Step: 50600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:21,000-Speed 3380.44 samples/sec Loss 6.1050 LearningRate 0.0634 Epoch: 4 Global Step: 50610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:24,066-Speed 3341.33 samples/sec Loss 6.0826 LearningRate 0.0634 Epoch: 4 Global Step: 50620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:27,108-Speed 3367.26 samples/sec Loss 6.0434 LearningRate 0.0634 Epoch: 4 Global Step: 50630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:30,153-Speed 3363.83 samples/sec Loss 6.1412 LearningRate 0.0634 Epoch: 4 Global Step: 50640 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:33,177-Speed 3387.32 samples/sec Loss 6.1133 LearningRate 0.0634 Epoch: 4 Global Step: 50650 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:36,302-Speed 3278.05 samples/sec Loss 6.1419 LearningRate 0.0634 Epoch: 4 Global Step: 50660 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:39,381-Speed 3326.54 samples/sec Loss 6.1661 LearningRate 0.0634 Epoch: 4 Global Step: 50670 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:42,437-Speed 3352.21 samples/sec Loss 6.1844 LearningRate 0.0634 Epoch: 4 Global Step: 50680 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:45,476-Speed 3370.37 samples/sec Loss 6.1510 LearningRate 0.0634 Epoch: 4 Global Step: 50690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:48,502-Speed 3384.54 samples/sec Loss 6.1400 LearningRate 0.0633 Epoch: 4 Global Step: 50700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:51,526-Speed 3388.04 samples/sec Loss 6.1695 LearningRate 0.0633 Epoch: 4 Global Step: 50710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:54,582-Speed 3352.31 samples/sec Loss 6.1160 LearningRate 0.0633 Epoch: 4 Global Step: 50720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:06:57,593-Speed 3401.20 samples/sec Loss 6.1574 LearningRate 0.0633 Epoch: 4 Global Step: 50730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:00,680-Speed 3318.52 samples/sec Loss 6.1836 LearningRate 0.0633 Epoch: 4 Global Step: 50740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:03,742-Speed 3344.45 samples/sec Loss 6.1269 LearningRate 0.0633 Epoch: 4 Global Step: 50750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:06,791-Speed 3359.47 samples/sec Loss 6.2116 LearningRate 0.0633 Epoch: 4 Global Step: 50760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:09,811-Speed 3392.11 samples/sec Loss 6.2507 LearningRate 0.0633 Epoch: 4 Global Step: 50770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:12,861-Speed 3358.54 samples/sec Loss 6.1259 LearningRate 0.0633 Epoch: 4 Global Step: 50780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:07:15,924-Speed 3344.54 samples/sec Loss 6.1708 LearningRate 0.0633 Epoch: 4 Global Step: 50790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:07:18,998-Speed 3332.13 samples/sec Loss 6.3231 LearningRate 0.0633 Epoch: 4 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:07:22,061-Speed 3343.77 samples/sec Loss 6.0979 LearningRate 0.0633 Epoch: 4 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:07:25,070-Speed 3404.55 samples/sec Loss 6.2398 LearningRate 0.0633 Epoch: 4 Global Step: 50820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:07:28,158-Speed 3317.70 samples/sec Loss 6.1852 LearningRate 0.0633 Epoch: 4 Global Step: 50830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:07:31,266-Speed 3295.87 samples/sec Loss 6.1817 LearningRate 0.0633 Epoch: 4 Global Step: 50840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:07:34,369-Speed 3301.22 samples/sec Loss 6.1519 LearningRate 0.0633 Epoch: 4 Global Step: 50850 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:37,487-Speed 3284.63 samples/sec Loss 6.1136 LearningRate 0.0632 Epoch: 4 Global Step: 50860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:40,596-Speed 3294.89 samples/sec Loss 6.2571 LearningRate 0.0632 Epoch: 4 Global Step: 50870 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:43,623-Speed 3384.37 samples/sec Loss 6.2671 LearningRate 0.0632 Epoch: 4 Global Step: 50880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:46,647-Speed 3386.95 samples/sec Loss 6.1101 LearningRate 0.0632 Epoch: 4 Global Step: 50890 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:49,694-Speed 3361.66 samples/sec Loss 6.1542 LearningRate 0.0632 Epoch: 4 Global Step: 50900 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:52,754-Speed 3347.56 samples/sec Loss 6.2145 LearningRate 0.0632 Epoch: 4 Global Step: 50910 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:55,765-Speed 3401.36 samples/sec Loss 6.1343 LearningRate 0.0632 Epoch: 4 Global Step: 50920 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:07:58,791-Speed 3385.03 samples/sec Loss 6.2091 LearningRate 0.0632 Epoch: 4 Global Step: 50930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:01,838-Speed 3361.88 samples/sec Loss 6.2584 LearningRate 0.0632 Epoch: 4 Global Step: 50940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:04,897-Speed 3349.20 samples/sec Loss 6.2616 LearningRate 0.0632 Epoch: 4 Global Step: 50950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:08:07,932-Speed 3374.63 samples/sec Loss 6.2888 LearningRate 0.0632 Epoch: 4 Global Step: 50960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:08:10,969-Speed 3372.33 samples/sec Loss 6.1620 LearningRate 0.0632 Epoch: 4 Global Step: 50970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:08:14,010-Speed 3368.70 samples/sec Loss 6.2789 LearningRate 0.0632 Epoch: 4 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:08:17,105-Speed 3310.08 samples/sec Loss 6.3284 LearningRate 0.0632 Epoch: 4 Global Step: 50990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:20,162-Speed 3350.44 samples/sec Loss 6.2204 LearningRate 0.0632 Epoch: 4 Global Step: 51000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:23,266-Speed 3299.92 samples/sec Loss 6.2074 LearningRate 0.0631 Epoch: 4 Global Step: 51010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:26,376-Speed 3293.90 samples/sec Loss 6.2491 LearningRate 0.0631 Epoch: 4 Global Step: 51020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:29,428-Speed 3355.77 samples/sec Loss 6.2740 LearningRate 0.0631 Epoch: 4 Global Step: 51030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:32,487-Speed 3348.81 samples/sec Loss 6.3253 LearningRate 0.0631 Epoch: 4 Global Step: 51040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:35,532-Speed 3364.15 samples/sec Loss 6.2658 LearningRate 0.0631 Epoch: 4 Global Step: 51050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:38,546-Speed 3398.51 samples/sec Loss 6.2937 LearningRate 0.0631 Epoch: 4 Global Step: 51060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:41,694-Speed 3254.07 samples/sec Loss 6.2920 LearningRate 0.0631 Epoch: 4 Global Step: 51070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:44,756-Speed 3345.15 samples/sec Loss 6.2969 LearningRate 0.0631 Epoch: 4 Global Step: 51080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:08:47,814-Speed 3350.37 samples/sec Loss 6.4103 LearningRate 0.0631 Epoch: 4 Global Step: 51090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:08:50,848-Speed 3375.89 samples/sec Loss 6.2967 LearningRate 0.0631 Epoch: 4 Global Step: 51100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:08:53,915-Speed 3339.17 samples/sec Loss 6.2460 LearningRate 0.0631 Epoch: 4 Global Step: 51110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:08:56,917-Speed 3412.41 samples/sec Loss 6.2709 LearningRate 0.0631 Epoch: 4 Global Step: 51120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:08:59,979-Speed 3346.13 samples/sec Loss 6.2938 LearningRate 0.0631 Epoch: 4 Global Step: 51130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:03,103-Speed 3278.05 samples/sec Loss 6.3529 LearningRate 0.0631 Epoch: 4 Global Step: 51140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:06,125-Speed 3389.74 samples/sec Loss 6.2585 LearningRate 0.0631 Epoch: 4 Global Step: 51150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:09,119-Speed 3420.76 samples/sec Loss 6.2987 LearningRate 0.0631 Epoch: 4 Global Step: 51160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:12,177-Speed 3349.71 samples/sec Loss 6.2658 LearningRate 0.0630 Epoch: 4 Global Step: 51170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:15,191-Speed 3398.62 samples/sec Loss 6.3588 LearningRate 0.0630 Epoch: 4 Global Step: 51180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:18,257-Speed 3341.10 samples/sec Loss 6.4071 LearningRate 0.0630 Epoch: 4 Global Step: 51190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 06:09:21,269-Speed 3400.55 samples/sec Loss 6.2310 LearningRate 0.0630 Epoch: 4 Global Step: 51200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 06:09:24,285-Speed 3397.23 samples/sec Loss 6.3028 LearningRate 0.0630 Epoch: 4 Global Step: 51210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:27,421-Speed 3266.20 samples/sec Loss 6.2873 LearningRate 0.0630 Epoch: 4 Global Step: 51220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:30,480-Speed 3347.68 samples/sec Loss 6.3279 LearningRate 0.0630 Epoch: 4 Global Step: 51230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:33,534-Speed 3354.01 samples/sec Loss 6.3746 LearningRate 0.0630 Epoch: 4 Global Step: 51240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:36,596-Speed 3346.10 samples/sec Loss 6.3619 LearningRate 0.0630 Epoch: 4 Global Step: 51250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:39,664-Speed 3338.07 samples/sec Loss 6.2948 LearningRate 0.0630 Epoch: 4 Global Step: 51260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:42,741-Speed 3329.70 samples/sec Loss 6.4472 LearningRate 0.0630 Epoch: 4 Global Step: 51270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:45,753-Speed 3400.77 samples/sec Loss 6.2252 LearningRate 0.0630 Epoch: 4 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:48,807-Speed 3353.29 samples/sec Loss 6.2702 LearningRate 0.0630 Epoch: 4 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:51,860-Speed 3355.51 samples/sec Loss 6.3873 LearningRate 0.0630 Epoch: 4 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:54,917-Speed 3351.35 samples/sec Loss 6.2154 LearningRate 0.0630 Epoch: 4 Global Step: 51310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:09:57,948-Speed 3378.70 samples/sec Loss 6.3105 LearningRate 0.0630 Epoch: 4 Global Step: 51320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:10:00,976-Speed 3383.08 samples/sec Loss 6.4002 LearningRate 0.0629 Epoch: 4 Global Step: 51330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:10:04,029-Speed 3355.55 samples/sec Loss 6.1793 LearningRate 0.0629 Epoch: 4 Global Step: 51340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:10:07,066-Speed 3373.07 samples/sec Loss 6.3227 LearningRate 0.0629 Epoch: 4 Global Step: 51350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:10:10,059-Speed 3422.09 samples/sec Loss 6.4258 LearningRate 0.0629 Epoch: 4 Global Step: 51360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:13,143-Speed 3321.05 samples/sec Loss 6.4273 LearningRate 0.0629 Epoch: 4 Global Step: 51370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:16,208-Speed 3342.16 samples/sec Loss 6.3158 LearningRate 0.0629 Epoch: 4 Global Step: 51380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:19,225-Speed 3395.51 samples/sec Loss 6.2834 LearningRate 0.0629 Epoch: 4 Global Step: 51390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:22,259-Speed 3375.39 samples/sec Loss 6.3046 LearningRate 0.0629 Epoch: 4 Global Step: 51400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:25,312-Speed 3355.07 samples/sec Loss 6.3444 LearningRate 0.0629 Epoch: 4 Global Step: 51410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:28,373-Speed 3347.34 samples/sec Loss 6.4690 LearningRate 0.0629 Epoch: 4 Global Step: 51420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:31,417-Speed 3364.04 samples/sec Loss 6.3435 LearningRate 0.0629 Epoch: 4 Global Step: 51430 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:34,449-Speed 3378.81 samples/sec Loss 6.2625 LearningRate 0.0629 Epoch: 4 Global Step: 51440 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:37,480-Speed 3379.93 samples/sec Loss 6.4097 LearningRate 0.0629 Epoch: 4 Global Step: 51450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:40,552-Speed 3334.90 samples/sec Loss 6.3961 LearningRate 0.0629 Epoch: 4 Global Step: 51460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:43,608-Speed 3351.31 samples/sec Loss 6.4085 LearningRate 0.0629 Epoch: 4 Global Step: 51470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:46,625-Speed 3395.28 samples/sec Loss 6.5130 LearningRate 0.0628 Epoch: 4 Global Step: 51480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:49,694-Speed 3338.40 samples/sec Loss 6.3251 LearningRate 0.0628 Epoch: 4 Global Step: 51490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:52,739-Speed 3364.26 samples/sec Loss 6.4515 LearningRate 0.0628 Epoch: 4 Global Step: 51500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:55,809-Speed 3336.80 samples/sec Loss 6.3762 LearningRate 0.0628 Epoch: 4 Global Step: 51510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:10:58,840-Speed 3379.31 samples/sec Loss 6.3981 LearningRate 0.0628 Epoch: 4 Global Step: 51520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:11:01,881-Speed 3368.71 samples/sec Loss 6.4151 LearningRate 0.0628 Epoch: 4 Global Step: 51530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:11:04,953-Speed 3334.88 samples/sec Loss 6.3375 LearningRate 0.0628 Epoch: 4 Global Step: 51540 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:11:08,044-Speed 3313.00 samples/sec Loss 6.4093 LearningRate 0.0628 Epoch: 4 Global Step: 51550 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:11:11,089-Speed 3363.87 samples/sec Loss 6.4683 LearningRate 0.0628 Epoch: 4 Global Step: 51560 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:11:14,162-Speed 3333.55 samples/sec Loss 6.4480 LearningRate 0.0628 Epoch: 4 Global Step: 51570 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:11:17,264-Speed 3302.05 samples/sec Loss 6.4171 LearningRate 0.0628 Epoch: 4 Global Step: 51580 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:11:20,275-Speed 3401.90 samples/sec Loss 6.4657 LearningRate 0.0628 Epoch: 4 Global Step: 51590 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:11:23,295-Speed 3392.50 samples/sec Loss 6.3220 LearningRate 0.0628 Epoch: 4 Global Step: 51600 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:11:26,350-Speed 3352.36 samples/sec Loss 6.3928 LearningRate 0.0628 Epoch: 4 Global Step: 51610 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:11:29,405-Speed 3353.19 samples/sec Loss 6.4697 LearningRate 0.0628 Epoch: 4 Global Step: 51620 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:11:32,430-Speed 3385.94 samples/sec Loss 6.4406 LearningRate 0.0628 Epoch: 4 Global Step: 51630 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:11:35,474-Speed 3366.40 samples/sec Loss 6.5548 LearningRate 0.0627 Epoch: 4 Global Step: 51640 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:11:38,525-Speed 3356.69 samples/sec Loss 6.4838 LearningRate 0.0627 Epoch: 4 Global Step: 51650 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:11:41,525-Speed 3415.56 samples/sec Loss 6.5232 LearningRate 0.0627 Epoch: 4 Global Step: 51660 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:11:44,583-Speed 3349.63 samples/sec Loss 6.4704 LearningRate 0.0627 Epoch: 4 Global Step: 51670 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:11:47,665-Speed 3323.88 samples/sec Loss 6.4459 LearningRate 0.0627 Epoch: 4 Global Step: 51680 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:11:50,686-Speed 3390.43 samples/sec Loss 6.3117 LearningRate 0.0627 Epoch: 4 Global Step: 51690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:11:53,745-Speed 3349.35 samples/sec Loss 6.4160 LearningRate 0.0627 Epoch: 4 Global Step: 51700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:11:56,737-Speed 3422.75 samples/sec Loss 6.3922 LearningRate 0.0627 Epoch: 4 Global Step: 51710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:11:59,809-Speed 3334.90 samples/sec Loss 6.4649 LearningRate 0.0627 Epoch: 4 Global Step: 51720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:12:02,857-Speed 3360.24 samples/sec Loss 6.5111 LearningRate 0.0627 Epoch: 4 Global Step: 51730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:12:05,857-Speed 3415.46 samples/sec Loss 6.4206 LearningRate 0.0627 Epoch: 4 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:08,898-Speed 3367.68 samples/sec Loss 6.4608 LearningRate 0.0627 Epoch: 4 Global Step: 51750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:11,955-Speed 3351.98 samples/sec Loss 6.5915 LearningRate 0.0627 Epoch: 4 Global Step: 51760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:15,015-Speed 3346.58 samples/sec Loss 6.5215 LearningRate 0.0627 Epoch: 4 Global Step: 51770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:18,085-Speed 3337.53 samples/sec Loss 6.3249 LearningRate 0.0627 Epoch: 4 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:21,134-Speed 3358.81 samples/sec Loss 6.5361 LearningRate 0.0627 Epoch: 4 Global Step: 51790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:24,168-Speed 3375.99 samples/sec Loss 6.5466 LearningRate 0.0626 Epoch: 4 Global Step: 51800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:27,201-Speed 3377.79 samples/sec Loss 6.4231 LearningRate 0.0626 Epoch: 4 Global Step: 51810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:30,240-Speed 3370.45 samples/sec Loss 6.5309 LearningRate 0.0626 Epoch: 4 Global Step: 51820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:33,271-Speed 3379.25 samples/sec Loss 6.5105 LearningRate 0.0626 Epoch: 4 Global Step: 51830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:36,322-Speed 3357.83 samples/sec Loss 6.4955 LearningRate 0.0626 Epoch: 4 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:39,411-Speed 3315.75 samples/sec Loss 6.4185 LearningRate 0.0626 Epoch: 4 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:42,536-Speed 3278.08 samples/sec Loss 6.4363 LearningRate 0.0626 Epoch: 4 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:45,553-Speed 3395.11 samples/sec Loss 6.5339 LearningRate 0.0626 Epoch: 4 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:12:48,590-Speed 3373.10 samples/sec Loss 6.4904 LearningRate 0.0626 Epoch: 4 Global Step: 51880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:12:51,647-Speed 3350.64 samples/sec Loss 6.5449 LearningRate 0.0626 Epoch: 4 Global Step: 51890 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:12:54,714-Speed 3340.36 samples/sec Loss 6.5674 LearningRate 0.0626 Epoch: 4 Global Step: 51900 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:12:57,744-Speed 3379.85 samples/sec Loss 6.4623 LearningRate 0.0626 Epoch: 4 Global Step: 51910 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:00,813-Speed 3338.67 samples/sec Loss 6.6449 LearningRate 0.0626 Epoch: 4 Global Step: 51920 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:03,872-Speed 3347.85 samples/sec Loss 6.3682 LearningRate 0.0626 Epoch: 4 Global Step: 51930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:06,901-Speed 3381.94 samples/sec Loss 6.6305 LearningRate 0.0626 Epoch: 4 Global Step: 51940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:09,919-Speed 3394.67 samples/sec Loss 6.5420 LearningRate 0.0625 Epoch: 4 Global Step: 51950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:12,968-Speed 3359.64 samples/sec Loss 6.3897 LearningRate 0.0625 Epoch: 4 Global Step: 51960 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:15,971-Speed 3410.82 samples/sec Loss 6.5447 LearningRate 0.0625 Epoch: 4 Global Step: 51970 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:19,007-Speed 3374.56 samples/sec Loss 6.4871 LearningRate 0.0625 Epoch: 4 Global Step: 51980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:22,006-Speed 3415.57 samples/sec Loss 6.5692 LearningRate 0.0625 Epoch: 4 Global Step: 51990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:25,054-Speed 3360.16 samples/sec Loss 6.6062 LearningRate 0.0625 Epoch: 4 Global Step: 52000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:28,125-Speed 3336.13 samples/sec Loss 6.5434 LearningRate 0.0625 Epoch: 4 Global Step: 52010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:31,175-Speed 3358.15 samples/sec Loss 6.5575 LearningRate 0.0625 Epoch: 4 Global Step: 52020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:34,186-Speed 3401.82 samples/sec Loss 6.3896 LearningRate 0.0625 Epoch: 4 Global Step: 52030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:37,240-Speed 3354.59 samples/sec Loss 6.5944 LearningRate 0.0625 Epoch: 4 Global Step: 52040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:40,323-Speed 3322.25 samples/sec Loss 6.5863 LearningRate 0.0625 Epoch: 4 Global Step: 52050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:43,393-Speed 3335.83 samples/sec Loss 6.3844 LearningRate 0.0625 Epoch: 4 Global Step: 52060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:46,407-Speed 3398.92 samples/sec Loss 6.4886 LearningRate 0.0625 Epoch: 4 Global Step: 52070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:49,408-Speed 3412.97 samples/sec Loss 6.6188 LearningRate 0.0625 Epoch: 4 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:13:52,480-Speed 3334.90 samples/sec Loss 6.4312 LearningRate 0.0625 Epoch: 4 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:13:55,526-Speed 3362.65 samples/sec Loss 6.5764 LearningRate 0.0625 Epoch: 4 Global Step: 52100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:13:58,581-Speed 3353.56 samples/sec Loss 6.5868 LearningRate 0.0624 Epoch: 4 Global Step: 52110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:01,686-Speed 3298.35 samples/sec Loss 6.5248 LearningRate 0.0624 Epoch: 4 Global Step: 52120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:04,811-Speed 3278.81 samples/sec Loss 6.5138 LearningRate 0.0624 Epoch: 4 Global Step: 52130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:07,837-Speed 3384.19 samples/sec Loss 6.5172 LearningRate 0.0624 Epoch: 4 Global Step: 52140 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:10,857-Speed 3392.09 samples/sec Loss 6.5577 LearningRate 0.0624 Epoch: 4 Global Step: 52150 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:13,920-Speed 3344.15 samples/sec Loss 6.4440 LearningRate 0.0624 Epoch: 4 Global Step: 52160 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:16,979-Speed 3349.12 samples/sec Loss 6.6316 LearningRate 0.0624 Epoch: 4 Global Step: 52170 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:20,059-Speed 3325.60 samples/sec Loss 6.4865 LearningRate 0.0624 Epoch: 4 Global Step: 52180 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:23,102-Speed 3366.06 samples/sec Loss 6.5725 LearningRate 0.0624 Epoch: 4 Global Step: 52190 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:26,177-Speed 3330.95 samples/sec Loss 6.5611 LearningRate 0.0624 Epoch: 4 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:14:29,244-Speed 3340.18 samples/sec Loss 6.5025 LearningRate 0.0624 Epoch: 4 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:14:32,311-Speed 3340.06 samples/sec Loss 6.5266 LearningRate 0.0624 Epoch: 4 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:14:35,317-Speed 3407.37 samples/sec Loss 6.5269 LearningRate 0.0624 Epoch: 4 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:14:38,329-Speed 3400.91 samples/sec Loss 6.5953 LearningRate 0.0624 Epoch: 4 Global Step: 52240 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:41,362-Speed 3377.79 samples/sec Loss 6.4939 LearningRate 0.0624 Epoch: 4 Global Step: 52250 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:44,360-Speed 3416.42 samples/sec Loss 6.4184 LearningRate 0.0624 Epoch: 4 Global Step: 52260 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:47,436-Speed 3330.68 samples/sec Loss 6.6780 LearningRate 0.0623 Epoch: 4 Global Step: 52270 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:50,519-Speed 3321.83 samples/sec Loss 6.3826 LearningRate 0.0623 Epoch: 4 Global Step: 52280 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:53,626-Speed 3297.16 samples/sec Loss 6.5166 LearningRate 0.0623 Epoch: 4 Global Step: 52290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:56,686-Speed 3347.99 samples/sec Loss 6.5324 LearningRate 0.0623 Epoch: 4 Global Step: 52300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:14:59,745-Speed 3347.86 samples/sec Loss 6.5606 LearningRate 0.0623 Epoch: 4 Global Step: 52310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:02,848-Speed 3301.22 samples/sec Loss 6.6353 LearningRate 0.0623 Epoch: 4 Global Step: 52320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:05,907-Speed 3349.64 samples/sec Loss 6.6835 LearningRate 0.0623 Epoch: 4 Global Step: 52330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:08,903-Speed 3418.62 samples/sec Loss 6.6224 LearningRate 0.0623 Epoch: 4 Global Step: 52340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:11,935-Speed 3378.49 samples/sec Loss 6.4453 LearningRate 0.0623 Epoch: 4 Global Step: 52350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:14,967-Speed 3377.88 samples/sec Loss 6.5218 LearningRate 0.0623 Epoch: 4 Global Step: 52360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:18,052-Speed 3320.74 samples/sec Loss 6.5273 LearningRate 0.0623 Epoch: 4 Global Step: 52370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:21,076-Speed 3387.17 samples/sec Loss 6.5745 LearningRate 0.0623 Epoch: 4 Global Step: 52380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:24,132-Speed 3352.14 samples/sec Loss 6.6037 LearningRate 0.0623 Epoch: 4 Global Step: 52390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:27,179-Speed 3360.95 samples/sec Loss 6.6340 LearningRate 0.0623 Epoch: 4 Global Step: 52400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:30,237-Speed 3349.58 samples/sec Loss 6.5461 LearningRate 0.0623 Epoch: 4 Global Step: 52410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:33,288-Speed 3357.98 samples/sec Loss 6.6451 LearningRate 0.0622 Epoch: 4 Global Step: 52420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:36,388-Speed 3304.54 samples/sec Loss 6.6230 LearningRate 0.0622 Epoch: 4 Global Step: 52430 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:39,463-Speed 3331.28 samples/sec Loss 6.6438 LearningRate 0.0622 Epoch: 4 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:15:42,498-Speed 3375.24 samples/sec Loss 6.5948 LearningRate 0.0622 Epoch: 4 Global Step: 52450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:45,511-Speed 3399.32 samples/sec Loss 6.6332 LearningRate 0.0622 Epoch: 4 Global Step: 52460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:48,536-Speed 3386.31 samples/sec Loss 6.5620 LearningRate 0.0622 Epoch: 4 Global Step: 52470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:51,570-Speed 3375.71 samples/sec Loss 6.4674 LearningRate 0.0622 Epoch: 4 Global Step: 52480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:54,614-Speed 3365.11 samples/sec Loss 6.4848 LearningRate 0.0622 Epoch: 4 Global Step: 52490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:15:57,641-Speed 3384.68 samples/sec Loss 6.5501 LearningRate 0.0622 Epoch: 4 Global Step: 52500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:16:00,767-Speed 3276.08 samples/sec Loss 6.6624 LearningRate 0.0622 Epoch: 4 Global Step: 52510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:16:03,770-Speed 3410.75 samples/sec Loss 6.5005 LearningRate 0.0622 Epoch: 4 Global Step: 52520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:16:06,789-Speed 3393.50 samples/sec Loss 6.7577 LearningRate 0.0622 Epoch: 4 Global Step: 52530 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:16:09,813-Speed 3387.74 samples/sec Loss 6.6664 LearningRate 0.0622 Epoch: 4 Global Step: 52540 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:16:12,830-Speed 3395.60 samples/sec Loss 6.7037 LearningRate 0.0622 Epoch: 4 Global Step: 52550 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:16:15,830-Speed 3414.34 samples/sec Loss 6.6636 LearningRate 0.0622 Epoch: 4 Global Step: 52560 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:16:18,896-Speed 3341.33 samples/sec Loss 6.6014 LearningRate 0.0622 Epoch: 4 Global Step: 52570 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:16:21,953-Speed 3350.54 samples/sec Loss 6.6686 LearningRate 0.0621 Epoch: 4 Global Step: 52580 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:16:25,007-Speed 3354.42 samples/sec Loss 6.6359 LearningRate 0.0621 Epoch: 4 Global Step: 52590 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:16:28,033-Speed 3384.03 samples/sec Loss 6.5551 LearningRate 0.0621 Epoch: 4 Global Step: 52600 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:16:31,074-Speed 3369.55 samples/sec Loss 6.6503 LearningRate 0.0621 Epoch: 4 Global Step: 52610 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:16:34,103-Speed 3381.27 samples/sec Loss 6.5444 LearningRate 0.0621 Epoch: 4 Global Step: 52620 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:16:37,193-Speed 3315.33 samples/sec Loss 6.5383 LearningRate 0.0621 Epoch: 4 Global Step: 52630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:16:40,219-Speed 3385.48 samples/sec Loss 6.6157 LearningRate 0.0621 Epoch: 4 Global Step: 52640 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:16:43,276-Speed 3350.74 samples/sec Loss 6.5336 LearningRate 0.0621 Epoch: 4 Global Step: 52650 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:16:46,314-Speed 3371.84 samples/sec Loss 6.6646 LearningRate 0.0621 Epoch: 4 Global Step: 52660 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:16:49,408-Speed 3309.57 samples/sec Loss 6.5835 LearningRate 0.0621 Epoch: 4 Global Step: 52670 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:16:52,444-Speed 3374.14 samples/sec Loss 6.6556 LearningRate 0.0621 Epoch: 4 Global Step: 52680 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:16:55,444-Speed 3415.01 samples/sec Loss 6.6327 LearningRate 0.0621 Epoch: 4 Global Step: 52690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:16:58,438-Speed 3421.37 samples/sec Loss 6.4961 LearningRate 0.0621 Epoch: 4 Global Step: 52700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:01,484-Speed 3362.85 samples/sec Loss 6.6698 LearningRate 0.0621 Epoch: 4 Global Step: 52710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:04,535-Speed 3357.83 samples/sec Loss 6.5600 LearningRate 0.0621 Epoch: 4 Global Step: 52720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:07,590-Speed 3352.57 samples/sec Loss 6.5210 LearningRate 0.0621 Epoch: 4 Global Step: 52730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:17:10,599-Speed 3404.11 samples/sec Loss 6.6028 LearningRate 0.0620 Epoch: 4 Global Step: 52740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:17:13,659-Speed 3347.23 samples/sec Loss 6.7227 LearningRate 0.0620 Epoch: 4 Global Step: 52750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:17:16,662-Speed 3412.49 samples/sec Loss 6.6006 LearningRate 0.0620 Epoch: 4 Global Step: 52760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:17:19,697-Speed 3374.04 samples/sec Loss 6.7280 LearningRate 0.0620 Epoch: 4 Global Step: 52770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:17:22,738-Speed 3368.56 samples/sec Loss 6.6007 LearningRate 0.0620 Epoch: 4 Global Step: 52780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:17:25,750-Speed 3401.87 samples/sec Loss 6.5982 LearningRate 0.0620 Epoch: 4 Global Step: 52790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:17:28,746-Speed 3418.29 samples/sec Loss 6.5630 LearningRate 0.0620 Epoch: 4 Global Step: 52800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:31,789-Speed 3366.74 samples/sec Loss 6.5698 LearningRate 0.0620 Epoch: 4 Global Step: 52810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:34,848-Speed 3348.58 samples/sec Loss 6.6464 LearningRate 0.0620 Epoch: 4 Global Step: 52820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:37,874-Speed 3384.46 samples/sec Loss 6.6042 LearningRate 0.0620 Epoch: 4 Global Step: 52830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:40,898-Speed 3387.97 samples/sec Loss 6.6064 LearningRate 0.0620 Epoch: 4 Global Step: 52840 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:43,901-Speed 3411.01 samples/sec Loss 6.7389 LearningRate 0.0620 Epoch: 4 Global Step: 52850 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:46,923-Speed 3389.98 samples/sec Loss 6.7301 LearningRate 0.0620 Epoch: 4 Global Step: 52860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:50,006-Speed 3322.25 samples/sec Loss 6.6167 LearningRate 0.0620 Epoch: 4 Global Step: 52870 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:53,063-Speed 3350.78 samples/sec Loss 6.6601 LearningRate 0.0620 Epoch: 4 Global Step: 52880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:56,117-Speed 3354.04 samples/sec Loss 6.6906 LearningRate 0.0620 Epoch: 4 Global Step: 52890 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:17:59,188-Speed 3335.57 samples/sec Loss 6.6407 LearningRate 0.0619 Epoch: 4 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:18:02,221-Speed 3377.45 samples/sec Loss 6.6183 LearningRate 0.0619 Epoch: 4 Global Step: 52910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:18:05,230-Speed 3404.69 samples/sec Loss 6.6986 LearningRate 0.0619 Epoch: 4 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:18:08,226-Speed 3418.81 samples/sec Loss 6.6679 LearningRate 0.0619 Epoch: 4 Global Step: 52930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:11,275-Speed 3359.09 samples/sec Loss 6.6127 LearningRate 0.0619 Epoch: 4 Global Step: 52940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:14,341-Speed 3341.30 samples/sec Loss 6.6502 LearningRate 0.0619 Epoch: 4 Global Step: 52950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:17,365-Speed 3387.54 samples/sec Loss 6.6295 LearningRate 0.0619 Epoch: 4 Global Step: 52960 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:20,366-Speed 3413.19 samples/sec Loss 6.7360 LearningRate 0.0619 Epoch: 4 Global Step: 52970 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:23,413-Speed 3361.44 samples/sec Loss 6.7470 LearningRate 0.0619 Epoch: 4 Global Step: 52980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:26,484-Speed 3336.16 samples/sec Loss 6.6278 LearningRate 0.0619 Epoch: 4 Global Step: 52990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:29,519-Speed 3374.51 samples/sec Loss 6.7165 LearningRate 0.0619 Epoch: 4 Global Step: 53000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:32,542-Speed 3389.73 samples/sec Loss 6.6252 LearningRate 0.0619 Epoch: 4 Global Step: 53010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:35,584-Speed 3366.79 samples/sec Loss 6.6728 LearningRate 0.0619 Epoch: 4 Global Step: 53020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:38,676-Speed 3312.92 samples/sec Loss 6.6165 LearningRate 0.0619 Epoch: 4 Global Step: 53030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:18:41,756-Speed 3324.98 samples/sec Loss 6.6882 LearningRate 0.0619 Epoch: 4 Global Step: 53040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:18:44,759-Speed 3411.16 samples/sec Loss 6.6242 LearningRate 0.0619 Epoch: 4 Global Step: 53050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:47,791-Speed 3378.73 samples/sec Loss 6.6871 LearningRate 0.0618 Epoch: 4 Global Step: 53060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:50,895-Speed 3299.90 samples/sec Loss 6.6545 LearningRate 0.0618 Epoch: 4 Global Step: 53070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:53,959-Speed 3343.33 samples/sec Loss 6.7056 LearningRate 0.0618 Epoch: 4 Global Step: 53080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:18:57,008-Speed 3359.44 samples/sec Loss 6.5633 LearningRate 0.0618 Epoch: 4 Global Step: 53090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:00,029-Speed 3390.58 samples/sec Loss 6.7089 LearningRate 0.0618 Epoch: 4 Global Step: 53100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:03,093-Speed 3343.75 samples/sec Loss 6.6911 LearningRate 0.0618 Epoch: 4 Global Step: 53110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:06,115-Speed 3388.74 samples/sec Loss 6.6997 LearningRate 0.0618 Epoch: 4 Global Step: 53120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:09,120-Speed 3409.47 samples/sec Loss 6.6224 LearningRate 0.0618 Epoch: 4 Global Step: 53130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:12,136-Speed 3396.44 samples/sec Loss 6.6003 LearningRate 0.0618 Epoch: 4 Global Step: 53140 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:15,203-Speed 3339.41 samples/sec Loss 6.6368 LearningRate 0.0618 Epoch: 4 Global Step: 53150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:19:18,303-Speed 3304.51 samples/sec Loss 6.7129 LearningRate 0.0618 Epoch: 4 Global Step: 53160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:19:21,317-Speed 3398.99 samples/sec Loss 6.7478 LearningRate 0.0618 Epoch: 4 Global Step: 53170 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:24,398-Speed 3324.49 samples/sec Loss 6.7546 LearningRate 0.0618 Epoch: 4 Global Step: 53180 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:27,497-Speed 3305.94 samples/sec Loss 6.7183 LearningRate 0.0618 Epoch: 4 Global Step: 53190 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:30,570-Speed 3332.72 samples/sec Loss 6.6873 LearningRate 0.0618 Epoch: 4 Global Step: 53200 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:33,571-Speed 3413.82 samples/sec Loss 6.7375 LearningRate 0.0617 Epoch: 4 Global Step: 53210 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:36,637-Speed 3341.27 samples/sec Loss 6.6505 LearningRate 0.0617 Epoch: 4 Global Step: 53220 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:39,711-Speed 3331.35 samples/sec Loss 6.6702 LearningRate 0.0617 Epoch: 4 Global Step: 53230 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:42,747-Speed 3374.06 samples/sec Loss 6.8159 LearningRate 0.0617 Epoch: 4 Global Step: 53240 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:45,771-Speed 3388.08 samples/sec Loss 6.6095 LearningRate 0.0617 Epoch: 4 Global Step: 53250 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:48,798-Speed 3383.68 samples/sec Loss 6.5472 LearningRate 0.0617 Epoch: 4 Global Step: 53260 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:19:51,884-Speed 3319.62 samples/sec Loss 6.7021 LearningRate 0.0617 Epoch: 4 Global Step: 53270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:19:54,950-Speed 3340.49 samples/sec Loss 6.7360 LearningRate 0.0617 Epoch: 4 Global Step: 53280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:19:57,955-Speed 3409.39 samples/sec Loss 6.7148 LearningRate 0.0617 Epoch: 4 Global Step: 53290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:20:00,974-Speed 3391.93 samples/sec Loss 6.6544 LearningRate 0.0617 Epoch: 4 Global Step: 53300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:20:04,044-Speed 3336.88 samples/sec Loss 6.7017 LearningRate 0.0617 Epoch: 4 Global Step: 53310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:20:07,068-Speed 3387.32 samples/sec Loss 6.6050 LearningRate 0.0617 Epoch: 4 Global Step: 53320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:20:10,074-Speed 3407.77 samples/sec Loss 6.6211 LearningRate 0.0617 Epoch: 4 Global Step: 53330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:20:13,143-Speed 3338.28 samples/sec Loss 6.7565 LearningRate 0.0617 Epoch: 4 Global Step: 53340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:20:16,221-Speed 3327.97 samples/sec Loss 6.7200 LearningRate 0.0617 Epoch: 4 Global Step: 53350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:20:19,295-Speed 3331.89 samples/sec Loss 6.7495 LearningRate 0.0617 Epoch: 4 Global Step: 53360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:20:22,300-Speed 3408.40 samples/sec Loss 6.6112 LearningRate 0.0616 Epoch: 4 Global Step: 53370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:20:25,312-Speed 3401.08 samples/sec Loss 6.7133 LearningRate 0.0616 Epoch: 4 Global Step: 53380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:20:28,395-Speed 3322.81 samples/sec Loss 6.8569 LearningRate 0.0616 Epoch: 4 Global Step: 53390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:20:31,414-Speed 3392.98 samples/sec Loss 6.6806 LearningRate 0.0616 Epoch: 4 Global Step: 53400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:20:34,482-Speed 3338.79 samples/sec Loss 6.6819 LearningRate 0.0616 Epoch: 4 Global Step: 53410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:20:37,554-Speed 3334.23 samples/sec Loss 6.7400 LearningRate 0.0616 Epoch: 4 Global Step: 53420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:20:40,609-Speed 3353.52 samples/sec Loss 6.7259 LearningRate 0.0616 Epoch: 4 Global Step: 53430 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:20:43,671-Speed 3344.70 samples/sec Loss 6.7451 LearningRate 0.0616 Epoch: 4 Global Step: 53440 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:20:46,665-Speed 3420.85 samples/sec Loss 6.6623 LearningRate 0.0616 Epoch: 4 Global Step: 53450 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:20:49,752-Speed 3318.28 samples/sec Loss 6.8562 LearningRate 0.0616 Epoch: 4 Global Step: 53460 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:20:52,791-Speed 3370.57 samples/sec Loss 6.5693 LearningRate 0.0616 Epoch: 4 Global Step: 53470 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:20:55,854-Speed 3344.18 samples/sec Loss 6.6679 LearningRate 0.0616 Epoch: 4 Global Step: 53480 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:20:58,891-Speed 3373.23 samples/sec Loss 6.6423 LearningRate 0.0616 Epoch: 4 Global Step: 53490 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:21:01,968-Speed 3329.38 samples/sec Loss 6.7911 LearningRate 0.0616 Epoch: 4 Global Step: 53500 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:21:05,056-Speed 3316.64 samples/sec Loss 6.6038 LearningRate 0.0616 Epoch: 4 Global Step: 53510 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:21:08,084-Speed 3383.08 samples/sec Loss 6.7124 LearningRate 0.0616 Epoch: 4 Global Step: 53520 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:21:11,081-Speed 3417.71 samples/sec Loss 6.7854 LearningRate 0.0615 Epoch: 4 Global Step: 53530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:21:14,170-Speed 3315.50 samples/sec Loss 6.7184 LearningRate 0.0615 Epoch: 4 Global Step: 53540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:21:17,238-Speed 3339.21 samples/sec Loss 6.6434 LearningRate 0.0615 Epoch: 4 Global Step: 53550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:21:20,267-Speed 3381.34 samples/sec Loss 6.7538 LearningRate 0.0615 Epoch: 4 Global Step: 53560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:21:23,364-Speed 3307.68 samples/sec Loss 6.7280 LearningRate 0.0615 Epoch: 4 Global Step: 53570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:21:26,457-Speed 3312.05 samples/sec Loss 6.8778 LearningRate 0.0615 Epoch: 4 Global Step: 53580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:21:29,513-Speed 3351.37 samples/sec Loss 6.7981 LearningRate 0.0615 Epoch: 4 Global Step: 53590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:21:32,516-Speed 3411.61 samples/sec Loss 6.7482 LearningRate 0.0615 Epoch: 4 Global Step: 53600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:21:35,601-Speed 3320.48 samples/sec Loss 6.6538 LearningRate 0.0615 Epoch: 4 Global Step: 53610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:21:38,646-Speed 3364.08 samples/sec Loss 6.7008 LearningRate 0.0615 Epoch: 4 Global Step: 53620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:21:41,686-Speed 3368.55 samples/sec Loss 6.7213 LearningRate 0.0615 Epoch: 4 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:21:44,712-Speed 3385.93 samples/sec Loss 6.7242 LearningRate 0.0615 Epoch: 4 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:21:47,782-Speed 3336.91 samples/sec Loss 6.8224 LearningRate 0.0615 Epoch: 4 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:21:50,817-Speed 3375.10 samples/sec Loss 6.6882 LearningRate 0.0615 Epoch: 4 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:21:53,858-Speed 3368.24 samples/sec Loss 6.8121 LearningRate 0.0615 Epoch: 4 Global Step: 53670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:21:56,853-Speed 3419.46 samples/sec Loss 6.6688 LearningRate 0.0615 Epoch: 4 Global Step: 53680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:21:59,887-Speed 3376.51 samples/sec Loss 6.7579 LearningRate 0.0614 Epoch: 4 Global Step: 53690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:22:02,957-Speed 3336.94 samples/sec Loss 6.6243 LearningRate 0.0614 Epoch: 4 Global Step: 53700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:22:05,967-Speed 3402.70 samples/sec Loss 6.6259 LearningRate 0.0614 Epoch: 4 Global Step: 53710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:22:08,981-Speed 3398.81 samples/sec Loss 6.6580 LearningRate 0.0614 Epoch: 4 Global Step: 53720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:22:11,998-Speed 3394.85 samples/sec Loss 6.7690 LearningRate 0.0614 Epoch: 4 Global Step: 53730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:22:14,982-Speed 3433.91 samples/sec Loss 6.7174 LearningRate 0.0614 Epoch: 4 Global Step: 53740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:17,996-Speed 3398.23 samples/sec Loss 6.7255 LearningRate 0.0614 Epoch: 4 Global Step: 53750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:21,023-Speed 3384.39 samples/sec Loss 6.7017 LearningRate 0.0614 Epoch: 4 Global Step: 53760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:24,043-Speed 3391.31 samples/sec Loss 6.8105 LearningRate 0.0614 Epoch: 4 Global Step: 53770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:27,091-Speed 3361.09 samples/sec Loss 6.7868 LearningRate 0.0614 Epoch: 4 Global Step: 53780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:30,111-Speed 3391.52 samples/sec Loss 6.8100 LearningRate 0.0614 Epoch: 4 Global Step: 53790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:33,152-Speed 3368.45 samples/sec Loss 6.7448 LearningRate 0.0614 Epoch: 4 Global Step: 53800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:36,218-Speed 3340.89 samples/sec Loss 6.7787 LearningRate 0.0614 Epoch: 4 Global Step: 53810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:39,260-Speed 3367.12 samples/sec Loss 6.7858 LearningRate 0.0614 Epoch: 4 Global Step: 53820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:42,273-Speed 3399.70 samples/sec Loss 6.7507 LearningRate 0.0614 Epoch: 4 Global Step: 53830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:45,307-Speed 3376.45 samples/sec Loss 6.5959 LearningRate 0.0614 Epoch: 4 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:22:48,364-Speed 3351.06 samples/sec Loss 6.7770 LearningRate 0.0613 Epoch: 4 Global Step: 53850 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:51,453-Speed 3316.19 samples/sec Loss 6.6932 LearningRate 0.0613 Epoch: 4 Global Step: 53860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:54,486-Speed 3377.47 samples/sec Loss 6.8256 LearningRate 0.0613 Epoch: 4 Global Step: 53870 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:22:57,500-Speed 3398.77 samples/sec Loss 6.7208 LearningRate 0.0613 Epoch: 4 Global Step: 53880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:00,553-Speed 3354.80 samples/sec Loss 6.7550 LearningRate 0.0613 Epoch: 4 Global Step: 53890 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:03,585-Speed 3378.76 samples/sec Loss 6.8410 LearningRate 0.0613 Epoch: 4 Global Step: 53900 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:06,592-Speed 3406.46 samples/sec Loss 6.7931 LearningRate 0.0613 Epoch: 4 Global Step: 53910 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:09,583-Speed 3424.68 samples/sec Loss 6.7551 LearningRate 0.0613 Epoch: 4 Global Step: 53920 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:12,650-Speed 3339.18 samples/sec Loss 6.6210 LearningRate 0.0613 Epoch: 4 Global Step: 53930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:15,718-Speed 3339.06 samples/sec Loss 6.7153 LearningRate 0.0613 Epoch: 4 Global Step: 53940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:18,780-Speed 3345.44 samples/sec Loss 6.7619 LearningRate 0.0613 Epoch: 4 Global Step: 53950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:23:21,783-Speed 3410.54 samples/sec Loss 6.7466 LearningRate 0.0613 Epoch: 4 Global Step: 53960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:23:24,789-Speed 3408.29 samples/sec Loss 6.8397 LearningRate 0.0613 Epoch: 4 Global Step: 53970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:23:27,816-Speed 3383.54 samples/sec Loss 6.8322 LearningRate 0.0613 Epoch: 4 Global Step: 53980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:23:30,836-Speed 3392.39 samples/sec Loss 6.8203 LearningRate 0.0613 Epoch: 4 Global Step: 53990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:23:33,881-Speed 3364.14 samples/sec Loss 6.8876 LearningRate 0.0613 Epoch: 4 Global Step: 54000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:23:36,974-Speed 3311.82 samples/sec Loss 6.8210 LearningRate 0.0612 Epoch: 4 Global Step: 54010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:40,017-Speed 3365.81 samples/sec Loss 6.8049 LearningRate 0.0612 Epoch: 4 Global Step: 54020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:43,155-Speed 3263.74 samples/sec Loss 6.8333 LearningRate 0.0612 Epoch: 4 Global Step: 54030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:46,175-Speed 3392.78 samples/sec Loss 6.7921 LearningRate 0.0612 Epoch: 4 Global Step: 54040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:49,266-Speed 3313.35 samples/sec Loss 6.8464 LearningRate 0.0612 Epoch: 4 Global Step: 54050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:52,330-Speed 3343.08 samples/sec Loss 6.7626 LearningRate 0.0612 Epoch: 4 Global Step: 54060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:55,364-Speed 3376.48 samples/sec Loss 6.7767 LearningRate 0.0612 Epoch: 4 Global Step: 54070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:23:58,429-Speed 3341.42 samples/sec Loss 6.7596 LearningRate 0.0612 Epoch: 4 Global Step: 54080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:24:01,495-Speed 3341.78 samples/sec Loss 6.8625 LearningRate 0.0612 Epoch: 4 Global Step: 54090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:24:04,545-Speed 3358.13 samples/sec Loss 6.8132 LearningRate 0.0612 Epoch: 4 Global Step: 54100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:24:07,618-Speed 3332.92 samples/sec Loss 6.8850 LearningRate 0.0612 Epoch: 4 Global Step: 54110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:24:10,665-Speed 3361.93 samples/sec Loss 6.7419 LearningRate 0.0612 Epoch: 4 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:24:13,697-Speed 3378.65 samples/sec Loss 6.7911 LearningRate 0.0612 Epoch: 4 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:24:16,743-Speed 3362.26 samples/sec Loss 6.8285 LearningRate 0.0612 Epoch: 4 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:24:19,763-Speed 3391.50 samples/sec Loss 6.8302 LearningRate 0.0612 Epoch: 4 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:24:22,772-Speed 3404.63 samples/sec Loss 6.8241 LearningRate 0.0611 Epoch: 4 Global Step: 54160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:24:25,766-Speed 3421.49 samples/sec Loss 6.8684 LearningRate 0.0611 Epoch: 4 Global Step: 54170 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:24:28,824-Speed 3349.32 samples/sec Loss 6.8416 LearningRate 0.0611 Epoch: 4 Global Step: 54180 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:24:31,927-Speed 3301.88 samples/sec Loss 6.8553 LearningRate 0.0611 Epoch: 4 Global Step: 54190 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:24:34,941-Speed 3397.69 samples/sec Loss 6.7763 LearningRate 0.0611 Epoch: 4 Global Step: 54200 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:24:38,039-Speed 3306.75 samples/sec Loss 6.9252 LearningRate 0.0611 Epoch: 4 Global Step: 54210 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:24:41,122-Speed 3322.89 samples/sec Loss 6.8336 LearningRate 0.0611 Epoch: 4 Global Step: 54220 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:24:44,140-Speed 3393.49 samples/sec Loss 6.8921 LearningRate 0.0611 Epoch: 4 Global Step: 54230 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:24:47,165-Speed 3386.62 samples/sec Loss 6.7608 LearningRate 0.0611 Epoch: 4 Global Step: 54240 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:24:50,245-Speed 3325.46 samples/sec Loss 6.8681 LearningRate 0.0611 Epoch: 4 Global Step: 54250 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:24:53,336-Speed 3313.79 samples/sec Loss 6.7591 LearningRate 0.0611 Epoch: 4 Global Step: 54260 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:24:56,344-Speed 3405.97 samples/sec Loss 6.7915 LearningRate 0.0611 Epoch: 4 Global Step: 54270 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:24:59,353-Speed 3403.57 samples/sec Loss 6.8847 LearningRate 0.0611 Epoch: 4 Global Step: 54280 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:02,404-Speed 3358.14 samples/sec Loss 6.8980 LearningRate 0.0611 Epoch: 4 Global Step: 54290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:05,433-Speed 3380.90 samples/sec Loss 6.8601 LearningRate 0.0611 Epoch: 4 Global Step: 54300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:08,467-Speed 3375.99 samples/sec Loss 6.8559 LearningRate 0.0611 Epoch: 4 Global Step: 54310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:11,484-Speed 3395.37 samples/sec Loss 6.7877 LearningRate 0.0610 Epoch: 4 Global Step: 54320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:14,534-Speed 3358.62 samples/sec Loss 6.8065 LearningRate 0.0610 Epoch: 4 Global Step: 54330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:17,574-Speed 3369.52 samples/sec Loss 6.7694 LearningRate 0.0610 Epoch: 4 Global Step: 54340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:20,587-Speed 3399.99 samples/sec Loss 6.8129 LearningRate 0.0610 Epoch: 4 Global Step: 54350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:23,585-Speed 3416.08 samples/sec Loss 6.7973 LearningRate 0.0610 Epoch: 4 Global Step: 54360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:26,632-Speed 3361.83 samples/sec Loss 6.7639 LearningRate 0.0610 Epoch: 4 Global Step: 54370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:25:29,641-Speed 3404.19 samples/sec Loss 6.6843 LearningRate 0.0610 Epoch: 4 Global Step: 54380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:25:32,673-Speed 3378.58 samples/sec Loss 6.8349 LearningRate 0.0610 Epoch: 4 Global Step: 54390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:35,685-Speed 3400.73 samples/sec Loss 6.7463 LearningRate 0.0610 Epoch: 4 Global Step: 54400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:38,721-Speed 3374.33 samples/sec Loss 6.9029 LearningRate 0.0610 Epoch: 4 Global Step: 54410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:41,805-Speed 3321.44 samples/sec Loss 6.6719 LearningRate 0.0610 Epoch: 4 Global Step: 54420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:44,821-Speed 3395.97 samples/sec Loss 6.8509 LearningRate 0.0610 Epoch: 4 Global Step: 54430 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:47,852-Speed 3379.80 samples/sec Loss 6.9617 LearningRate 0.0610 Epoch: 4 Global Step: 54440 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:50,882-Speed 3380.38 samples/sec Loss 6.6932 LearningRate 0.0610 Epoch: 4 Global Step: 54450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:53,967-Speed 3320.44 samples/sec Loss 6.8162 LearningRate 0.0610 Epoch: 4 Global Step: 54460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:25:57,002-Speed 3375.10 samples/sec Loss 6.8308 LearningRate 0.0610 Epoch: 4 Global Step: 54470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:00,013-Speed 3402.31 samples/sec Loss 6.8478 LearningRate 0.0609 Epoch: 4 Global Step: 54480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:03,079-Speed 3340.75 samples/sec Loss 6.8113 LearningRate 0.0609 Epoch: 4 Global Step: 54490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:06,101-Speed 3388.83 samples/sec Loss 6.8763 LearningRate 0.0609 Epoch: 4 Global Step: 54500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:09,112-Speed 3402.72 samples/sec Loss 6.7882 LearningRate 0.0609 Epoch: 4 Global Step: 54510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:12,146-Speed 3375.84 samples/sec Loss 6.8869 LearningRate 0.0609 Epoch: 4 Global Step: 54520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:15,198-Speed 3356.03 samples/sec Loss 6.7613 LearningRate 0.0609 Epoch: 4 Global Step: 54530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:18,290-Speed 3312.76 samples/sec Loss 6.8042 LearningRate 0.0609 Epoch: 4 Global Step: 54540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:21,285-Speed 3420.61 samples/sec Loss 6.8150 LearningRate 0.0609 Epoch: 4 Global Step: 54550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:24,290-Speed 3408.52 samples/sec Loss 6.8870 LearningRate 0.0609 Epoch: 4 Global Step: 54560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:27,289-Speed 3415.71 samples/sec Loss 6.7079 LearningRate 0.0609 Epoch: 4 Global Step: 54570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:30,329-Speed 3369.98 samples/sec Loss 6.7501 LearningRate 0.0609 Epoch: 4 Global Step: 54580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:26:33,345-Speed 3396.39 samples/sec Loss 6.7403 LearningRate 0.0609 Epoch: 4 Global Step: 54590 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:26:36,380-Speed 3375.18 samples/sec Loss 6.8993 LearningRate 0.0609 Epoch: 4 Global Step: 54600 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:26:39,383-Speed 3410.64 samples/sec Loss 6.6977 LearningRate 0.0609 Epoch: 4 Global Step: 54610 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:26:42,402-Speed 3393.33 samples/sec Loss 6.6458 LearningRate 0.0609 Epoch: 4 Global Step: 54620 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:26:45,408-Speed 3407.24 samples/sec Loss 6.6515 LearningRate 0.0609 Epoch: 4 Global Step: 54630 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:26:48,429-Speed 3390.98 samples/sec Loss 6.9202 LearningRate 0.0608 Epoch: 4 Global Step: 54640 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:26:51,459-Speed 3380.86 samples/sec Loss 6.8175 LearningRate 0.0608 Epoch: 4 Global Step: 54650 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:26:54,504-Speed 3363.59 samples/sec Loss 6.9099 LearningRate 0.0608 Epoch: 4 Global Step: 54660 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:26:57,503-Speed 3416.08 samples/sec Loss 6.9320 LearningRate 0.0608 Epoch: 4 Global Step: 54670 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:27:00,552-Speed 3359.20 samples/sec Loss 6.8069 LearningRate 0.0608 Epoch: 4 Global Step: 54680 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:27:03,589-Speed 3373.42 samples/sec Loss 6.8481 LearningRate 0.0608 Epoch: 4 Global Step: 54690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:27:06,667-Speed 3327.74 samples/sec Loss 6.7409 LearningRate 0.0608 Epoch: 4 Global Step: 54700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:27:09,675-Speed 3405.35 samples/sec Loss 6.7882 LearningRate 0.0608 Epoch: 4 Global Step: 54710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:27:12,737-Speed 3345.04 samples/sec Loss 6.8015 LearningRate 0.0608 Epoch: 4 Global Step: 54720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:27:15,875-Speed 3264.21 samples/sec Loss 6.7850 LearningRate 0.0608 Epoch: 4 Global Step: 54730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:27:18,995-Speed 3283.11 samples/sec Loss 6.9141 LearningRate 0.0608 Epoch: 4 Global Step: 54740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:27:22,010-Speed 3397.48 samples/sec Loss 6.8934 LearningRate 0.0608 Epoch: 4 Global Step: 54750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:27:25,123-Speed 3290.49 samples/sec Loss 6.8373 LearningRate 0.0608 Epoch: 4 Global Step: 54760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:27:28,207-Speed 3321.69 samples/sec Loss 6.8481 LearningRate 0.0608 Epoch: 4 Global Step: 54770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:27:31,310-Speed 3301.23 samples/sec Loss 6.8484 LearningRate 0.0608 Epoch: 4 Global Step: 54780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:27:34,352-Speed 3367.29 samples/sec Loss 6.9815 LearningRate 0.0608 Epoch: 4 Global Step: 54790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:27:37,470-Speed 3285.43 samples/sec Loss 6.9470 LearningRate 0.0607 Epoch: 4 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:27:40,524-Speed 3354.34 samples/sec Loss 6.8486 LearningRate 0.0607 Epoch: 4 Global Step: 54810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:27:43,648-Speed 3278.34 samples/sec Loss 6.8334 LearningRate 0.0607 Epoch: 4 Global Step: 54820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:27:46,664-Speed 3396.67 samples/sec Loss 6.8410 LearningRate 0.0607 Epoch: 4 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:27:49,704-Speed 3369.47 samples/sec Loss 6.7820 LearningRate 0.0607 Epoch: 4 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:27:52,852-Speed 3253.48 samples/sec Loss 6.9031 LearningRate 0.0607 Epoch: 4 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:27:55,864-Speed 3401.36 samples/sec Loss 6.7258 LearningRate 0.0607 Epoch: 4 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:27:58,906-Speed 3366.52 samples/sec Loss 6.8123 LearningRate 0.0607 Epoch: 4 Global Step: 54870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:28:01,968-Speed 3345.74 samples/sec Loss 6.8087 LearningRate 0.0607 Epoch: 4 Global Step: 54880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:28:05,054-Speed 3319.10 samples/sec Loss 6.8071 LearningRate 0.0607 Epoch: 4 Global Step: 54890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 06:28:08,086-Speed 3378.91 samples/sec Loss 6.8148 LearningRate 0.0607 Epoch: 4 Global Step: 54900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 06:28:11,120-Speed 3376.01 samples/sec Loss 6.9082 LearningRate 0.0607 Epoch: 4 Global Step: 54910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:28:14,203-Speed 3322.32 samples/sec Loss 6.7856 LearningRate 0.0607 Epoch: 4 Global Step: 54920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:28:17,265-Speed 3344.84 samples/sec Loss 6.7327 LearningRate 0.0607 Epoch: 4 Global Step: 54930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:28:20,268-Speed 3411.55 samples/sec Loss 6.9449 LearningRate 0.0607 Epoch: 4 Global Step: 54940 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:28:23,332-Speed 3343.40 samples/sec Loss 6.9213 LearningRate 0.0607 Epoch: 4 Global Step: 54950 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:28:26,355-Speed 3388.73 samples/sec Loss 6.8140 LearningRate 0.0606 Epoch: 4 Global Step: 54960 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:28:29,448-Speed 3311.00 samples/sec Loss 6.8562 LearningRate 0.0606 Epoch: 4 Global Step: 54970 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:28:32,522-Speed 3332.75 samples/sec Loss 6.7971 LearningRate 0.0606 Epoch: 4 Global Step: 54980 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:28:35,571-Speed 3359.34 samples/sec Loss 6.8553 LearningRate 0.0606 Epoch: 4 Global Step: 54990 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:28:38,695-Speed 3278.51 samples/sec Loss 6.8688 LearningRate 0.0606 Epoch: 4 Global Step: 55000 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:28:41,811-Speed 3287.67 samples/sec Loss 6.7569 LearningRate 0.0606 Epoch: 4 Global Step: 55010 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:28:44,871-Speed 3346.75 samples/sec Loss 6.8997 LearningRate 0.0606 Epoch: 4 Global Step: 55020 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:28:47,947-Speed 3330.17 samples/sec Loss 6.8070 LearningRate 0.0606 Epoch: 4 Global Step: 55030 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-04-27 06:28:50,977-Speed 3380.63 samples/sec Loss 6.9622 LearningRate 0.0606 Epoch: 4 Global Step: 55040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:28:54,048-Speed 3335.15 samples/sec Loss 6.8301 LearningRate 0.0606 Epoch: 4 Global Step: 55050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:28:57,050-Speed 3412.78 samples/sec Loss 6.8390 LearningRate 0.0606 Epoch: 4 Global Step: 55060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:29:00,042-Speed 3423.31 samples/sec Loss 6.8647 LearningRate 0.0606 Epoch: 4 Global Step: 55070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:29:03,139-Speed 3307.96 samples/sec Loss 6.9625 LearningRate 0.0606 Epoch: 4 Global Step: 55080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:29:06,225-Speed 3318.35 samples/sec Loss 6.7607 LearningRate 0.0606 Epoch: 4 Global Step: 55090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:29:09,239-Speed 3399.47 samples/sec Loss 6.7323 LearningRate 0.0606 Epoch: 4 Global Step: 55100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:29:12,330-Speed 3313.48 samples/sec Loss 6.8832 LearningRate 0.0606 Epoch: 4 Global Step: 55110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:29:15,401-Speed 3336.12 samples/sec Loss 6.9589 LearningRate 0.0605 Epoch: 4 Global Step: 55120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:29:18,450-Speed 3358.56 samples/sec Loss 6.8875 LearningRate 0.0605 Epoch: 4 Global Step: 55130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 06:29:21,478-Speed 3383.01 samples/sec Loss 6.8928 LearningRate 0.0605 Epoch: 4 Global Step: 55140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:29:24,538-Speed 3347.44 samples/sec Loss 6.7525 LearningRate 0.0605 Epoch: 4 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 06:29:27,629-Speed 3314.34 samples/sec Loss 6.9212 LearningRate 0.0605 Epoch: 4 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:29:30,672-Speed 3365.70 samples/sec Loss 6.8991 LearningRate 0.0605 Epoch: 4 Global Step: 55170 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:29:33,761-Speed 3316.50 samples/sec Loss 6.9627 LearningRate 0.0605 Epoch: 4 Global Step: 55180 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:29:36,834-Speed 3333.85 samples/sec Loss 6.8404 LearningRate 0.0605 Epoch: 4 Global Step: 55190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:29:39,906-Speed 3333.40 samples/sec Loss 6.8001 LearningRate 0.0605 Epoch: 4 Global Step: 55200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:29:42,968-Speed 3345.91 samples/sec Loss 6.8053 LearningRate 0.0605 Epoch: 4 Global Step: 55210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:29:45,972-Speed 3410.40 samples/sec Loss 6.9141 LearningRate 0.0605 Epoch: 4 Global Step: 55220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:29:49,013-Speed 3367.93 samples/sec Loss 6.8452 LearningRate 0.0605 Epoch: 4 Global Step: 55230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:29:52,122-Speed 3295.30 samples/sec Loss 6.7308 LearningRate 0.0605 Epoch: 4 Global Step: 55240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:29:55,161-Speed 3370.78 samples/sec Loss 6.8417 LearningRate 0.0605 Epoch: 4 Global Step: 55250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:29:58,204-Speed 3365.36 samples/sec Loss 6.8964 LearningRate 0.0605 Epoch: 4 Global Step: 55260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:30:01,244-Speed 3369.54 samples/sec Loss 6.8247 LearningRate 0.0605 Epoch: 4 Global Step: 55270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:30:04,271-Speed 3384.67 samples/sec Loss 6.8380 LearningRate 0.0604 Epoch: 4 Global Step: 55280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:30:07,342-Speed 3335.67 samples/sec Loss 6.9271 LearningRate 0.0604 Epoch: 4 Global Step: 55290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:30:10,350-Speed 3405.63 samples/sec Loss 6.8625 LearningRate 0.0604 Epoch: 4 Global Step: 55300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:30:13,399-Speed 3359.49 samples/sec Loss 6.8619 LearningRate 0.0604 Epoch: 4 Global Step: 55310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:30:16,479-Speed 3325.96 samples/sec Loss 6.8798 LearningRate 0.0604 Epoch: 4 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:30:19,528-Speed 3359.27 samples/sec Loss 6.7474 LearningRate 0.0604 Epoch: 4 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:30:22,549-Speed 3390.43 samples/sec Loss 6.8970 LearningRate 0.0604 Epoch: 4 Global Step: 55340 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:30:25,610-Speed 3346.07 samples/sec Loss 6.8912 LearningRate 0.0604 Epoch: 4 Global Step: 55350 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:30:28,658-Speed 3360.63 samples/sec Loss 6.8852 LearningRate 0.0604 Epoch: 4 Global Step: 55360 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:30:31,691-Speed 3377.68 samples/sec Loss 6.7683 LearningRate 0.0604 Epoch: 4 Global Step: 55370 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:30:34,718-Speed 3383.90 samples/sec Loss 6.6415 LearningRate 0.0604 Epoch: 4 Global Step: 55380 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:30:37,774-Speed 3352.82 samples/sec Loss 6.9350 LearningRate 0.0604 Epoch: 4 Global Step: 55390 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:30:40,816-Speed 3367.18 samples/sec Loss 6.8546 LearningRate 0.0604 Epoch: 4 Global Step: 55400 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:30:43,859-Speed 3365.07 samples/sec Loss 6.8692 LearningRate 0.0604 Epoch: 4 Global Step: 55410 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:30:46,922-Speed 3345.01 samples/sec Loss 6.8319 LearningRate 0.0604 Epoch: 4 Global Step: 55420 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:30:50,068-Speed 3255.59 samples/sec Loss 6.8684 LearningRate 0.0604 Epoch: 4 Global Step: 55430 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:30:53,108-Speed 3369.34 samples/sec Loss 6.9289 LearningRate 0.0603 Epoch: 4 Global Step: 55440 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:30:56,142-Speed 3375.66 samples/sec Loss 6.8925 LearningRate 0.0603 Epoch: 4 Global Step: 55450 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:30:59,152-Speed 3404.31 samples/sec Loss 6.8443 LearningRate 0.0603 Epoch: 4 Global Step: 55460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:31:02,210-Speed 3350.14 samples/sec Loss 6.8663 LearningRate 0.0603 Epoch: 4 Global Step: 55470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:31:05,310-Speed 3303.88 samples/sec Loss 6.8326 LearningRate 0.0603 Epoch: 4 Global Step: 55480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:31:08,336-Speed 3385.21 samples/sec Loss 6.9410 LearningRate 0.0603 Epoch: 4 Global Step: 55490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:31:11,380-Speed 3365.22 samples/sec Loss 6.8358 LearningRate 0.0603 Epoch: 4 Global Step: 55500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:31:14,428-Speed 3360.12 samples/sec Loss 6.7013 LearningRate 0.0603 Epoch: 4 Global Step: 55510 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:31:17,473-Speed 3364.54 samples/sec Loss 6.9503 LearningRate 0.0603 Epoch: 4 Global Step: 55520 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:31:20,492-Speed 3392.26 samples/sec Loss 6.8862 LearningRate 0.0603 Epoch: 4 Global Step: 55530 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:31:23,523-Speed 3379.64 samples/sec Loss 6.8157 LearningRate 0.0603 Epoch: 4 Global Step: 55540 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:31:26,549-Speed 3385.83 samples/sec Loss 6.7684 LearningRate 0.0603 Epoch: 4 Global Step: 55550 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:31:29,645-Speed 3307.97 samples/sec Loss 6.8686 LearningRate 0.0603 Epoch: 4 Global Step: 55560 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:31:32,661-Speed 3395.76 samples/sec Loss 6.8279 LearningRate 0.0603 Epoch: 4 Global Step: 55570 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:31:35,721-Speed 3348.22 samples/sec Loss 6.9316 LearningRate 0.0603 Epoch: 4 Global Step: 55580 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:31:38,818-Speed 3307.62 samples/sec Loss 6.7737 LearningRate 0.0603 Epoch: 4 Global Step: 55590 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:31:41,897-Speed 3326.29 samples/sec Loss 6.8395 LearningRate 0.0602 Epoch: 4 Global Step: 55600 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:31:44,937-Speed 3369.36 samples/sec Loss 6.9260 LearningRate 0.0602 Epoch: 4 Global Step: 55610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:31:47,947-Speed 3403.85 samples/sec Loss 6.7795 LearningRate 0.0602 Epoch: 4 Global Step: 55620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:31:51,022-Speed 3330.06 samples/sec Loss 6.9177 LearningRate 0.0602 Epoch: 4 Global Step: 55630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:31:54,119-Speed 3308.41 samples/sec Loss 6.7946 LearningRate 0.0602 Epoch: 4 Global Step: 55640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:31:57,181-Speed 3344.68 samples/sec Loss 6.7777 LearningRate 0.0602 Epoch: 4 Global Step: 55650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:00,239-Speed 3350.33 samples/sec Loss 6.8495 LearningRate 0.0602 Epoch: 4 Global Step: 55660 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:03,323-Speed 3320.52 samples/sec Loss 6.8856 LearningRate 0.0602 Epoch: 4 Global Step: 55670 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:06,340-Speed 3395.87 samples/sec Loss 6.9335 LearningRate 0.0602 Epoch: 4 Global Step: 55680 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:09,359-Speed 3392.36 samples/sec Loss 6.7794 LearningRate 0.0602 Epoch: 4 Global Step: 55690 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:12,403-Speed 3366.30 samples/sec Loss 6.9090 LearningRate 0.0602 Epoch: 4 Global Step: 55700 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:15,432-Speed 3381.89 samples/sec Loss 6.7232 LearningRate 0.0602 Epoch: 4 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:32:18,472-Speed 3368.97 samples/sec Loss 6.9114 LearningRate 0.0602 Epoch: 4 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:32:21,483-Speed 3402.11 samples/sec Loss 6.9574 LearningRate 0.0602 Epoch: 4 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:32:24,513-Speed 3380.95 samples/sec Loss 6.7792 LearningRate 0.0602 Epoch: 4 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:32:27,571-Speed 3349.10 samples/sec Loss 6.8384 LearningRate 0.0602 Epoch: 4 Global Step: 55750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:32:30,616-Speed 3363.91 samples/sec Loss 6.7983 LearningRate 0.0601 Epoch: 4 Global Step: 55760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:32:33,628-Speed 3400.50 samples/sec Loss 6.8304 LearningRate 0.0601 Epoch: 4 Global Step: 55770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:32:36,696-Speed 3339.33 samples/sec Loss 6.7561 LearningRate 0.0601 Epoch: 4 Global Step: 55780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:39,750-Speed 3353.89 samples/sec Loss 6.8576 LearningRate 0.0601 Epoch: 4 Global Step: 55790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:42,795-Speed 3363.80 samples/sec Loss 6.9402 LearningRate 0.0601 Epoch: 4 Global Step: 55800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:45,827-Speed 3378.87 samples/sec Loss 6.8693 LearningRate 0.0601 Epoch: 4 Global Step: 55810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:48,917-Speed 3315.22 samples/sec Loss 6.9055 LearningRate 0.0601 Epoch: 4 Global Step: 55820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:52,007-Speed 3314.70 samples/sec Loss 6.8822 LearningRate 0.0601 Epoch: 4 Global Step: 55830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:55,025-Speed 3394.39 samples/sec Loss 6.7994 LearningRate 0.0601 Epoch: 4 Global Step: 55840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:32:58,043-Speed 3393.51 samples/sec Loss 6.8663 LearningRate 0.0601 Epoch: 4 Global Step: 55850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:01,079-Speed 3374.80 samples/sec Loss 6.8251 LearningRate 0.0601 Epoch: 4 Global Step: 55860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:04,157-Speed 3327.05 samples/sec Loss 6.8797 LearningRate 0.0601 Epoch: 4 Global Step: 55870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:07,216-Speed 3349.21 samples/sec Loss 6.9622 LearningRate 0.0601 Epoch: 4 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:33:10,205-Speed 3425.98 samples/sec Loss 6.7625 LearningRate 0.0601 Epoch: 4 Global Step: 55890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:33:13,226-Speed 3391.23 samples/sec Loss 6.8308 LearningRate 0.0601 Epoch: 4 Global Step: 55900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:33:16,289-Speed 3343.99 samples/sec Loss 6.8104 LearningRate 0.0601 Epoch: 4 Global Step: 55910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:33:19,409-Speed 3283.78 samples/sec Loss 6.8693 LearningRate 0.0600 Epoch: 4 Global Step: 55920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:33:22,469-Speed 3347.94 samples/sec Loss 6.8224 LearningRate 0.0600 Epoch: 4 Global Step: 55930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:33:25,622-Speed 3248.24 samples/sec Loss 6.9010 LearningRate 0.0600 Epoch: 4 Global Step: 55940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:33:28,710-Speed 3317.24 samples/sec Loss 6.8719 LearningRate 0.0600 Epoch: 4 Global Step: 55950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:31,729-Speed 3393.17 samples/sec Loss 6.7447 LearningRate 0.0600 Epoch: 4 Global Step: 55960 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:34,755-Speed 3384.37 samples/sec Loss 6.9583 LearningRate 0.0600 Epoch: 4 Global Step: 55970 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:37,845-Speed 3314.84 samples/sec Loss 6.9575 LearningRate 0.0600 Epoch: 4 Global Step: 55980 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:40,922-Speed 3329.33 samples/sec Loss 6.7777 LearningRate 0.0600 Epoch: 4 Global Step: 55990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:43,982-Speed 3346.96 samples/sec Loss 6.8724 LearningRate 0.0600 Epoch: 4 Global Step: 56000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:47,026-Speed 3365.86 samples/sec Loss 6.8693 LearningRate 0.0600 Epoch: 4 Global Step: 56010 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:50,117-Speed 3314.09 samples/sec Loss 6.8637 LearningRate 0.0600 Epoch: 4 Global Step: 56020 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:53,140-Speed 3388.73 samples/sec Loss 6.7948 LearningRate 0.0600 Epoch: 4 Global Step: 56030 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:56,192-Speed 3355.60 samples/sec Loss 6.7401 LearningRate 0.0600 Epoch: 4 Global Step: 56040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:33:59,202-Speed 3403.30 samples/sec Loss 6.9986 LearningRate 0.0600 Epoch: 4 Global Step: 56050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:02,300-Speed 3306.77 samples/sec Loss 6.8838 LearningRate 0.0600 Epoch: 4 Global Step: 56060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:05,381-Speed 3324.20 samples/sec Loss 6.9940 LearningRate 0.0600 Epoch: 4 Global Step: 56070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:08,434-Speed 3355.04 samples/sec Loss 7.0090 LearningRate 0.0599 Epoch: 4 Global Step: 56080 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:11,496-Speed 3345.83 samples/sec Loss 6.9033 LearningRate 0.0599 Epoch: 4 Global Step: 56090 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:14,621-Speed 3277.24 samples/sec Loss 6.9203 LearningRate 0.0599 Epoch: 4 Global Step: 56100 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:17,651-Speed 3381.42 samples/sec Loss 6.9473 LearningRate 0.0599 Epoch: 4 Global Step: 56110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:20,694-Speed 3365.84 samples/sec Loss 6.9254 LearningRate 0.0599 Epoch: 4 Global Step: 56120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:23,789-Speed 3308.85 samples/sec Loss 6.9042 LearningRate 0.0599 Epoch: 4 Global Step: 56130 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:26,849-Speed 3348.25 samples/sec Loss 6.9536 LearningRate 0.0599 Epoch: 4 Global Step: 56140 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:29,873-Speed 3386.97 samples/sec Loss 6.8644 LearningRate 0.0599 Epoch: 4 Global Step: 56150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:34:32,926-Speed 3355.58 samples/sec Loss 6.9083 LearningRate 0.0599 Epoch: 4 Global Step: 56160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:34:35,946-Speed 3391.91 samples/sec Loss 6.9082 LearningRate 0.0599 Epoch: 4 Global Step: 56170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:34:38,994-Speed 3360.70 samples/sec Loss 6.7213 LearningRate 0.0599 Epoch: 4 Global Step: 56180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:34:42,060-Speed 3340.91 samples/sec Loss 6.8043 LearningRate 0.0599 Epoch: 4 Global Step: 56190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:45,097-Speed 3372.36 samples/sec Loss 6.9065 LearningRate 0.0599 Epoch: 4 Global Step: 56200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:48,154-Speed 3350.89 samples/sec Loss 6.7853 LearningRate 0.0599 Epoch: 4 Global Step: 56210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:51,218-Speed 3343.76 samples/sec Loss 6.9160 LearningRate 0.0599 Epoch: 4 Global Step: 56220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:54,264-Speed 3362.19 samples/sec Loss 6.9789 LearningRate 0.0599 Epoch: 4 Global Step: 56230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:34:57,300-Speed 3374.01 samples/sec Loss 6.9549 LearningRate 0.0598 Epoch: 4 Global Step: 56240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:35:00,358-Speed 3349.86 samples/sec Loss 6.8831 LearningRate 0.0598 Epoch: 4 Global Step: 56250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:35:03,408-Speed 3358.11 samples/sec Loss 6.7880 LearningRate 0.0598 Epoch: 4 Global Step: 56260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:35:06,436-Speed 3383.22 samples/sec Loss 6.7779 LearningRate 0.0598 Epoch: 4 Global Step: 56270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:35:09,455-Speed 3392.62 samples/sec Loss 6.9586 LearningRate 0.0598 Epoch: 4 Global Step: 56280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:35:12,509-Speed 3354.37 samples/sec Loss 6.9367 LearningRate 0.0598 Epoch: 4 Global Step: 56290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:35:15,551-Speed 3367.60 samples/sec Loss 6.8211 LearningRate 0.0598 Epoch: 4 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:35:18,602-Speed 3357.59 samples/sec Loss 6.9260 LearningRate 0.0598 Epoch: 4 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:35:21,603-Speed 3412.90 samples/sec Loss 6.8549 LearningRate 0.0598 Epoch: 4 Global Step: 56320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:35:24,675-Speed 3334.61 samples/sec Loss 6.7991 LearningRate 0.0598 Epoch: 4 Global Step: 56330 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:35:27,699-Speed 3386.29 samples/sec Loss 6.8801 LearningRate 0.0598 Epoch: 4 Global Step: 56340 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:35:30,796-Speed 3308.30 samples/sec Loss 6.9206 LearningRate 0.0598 Epoch: 4 Global Step: 56350 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:35:33,808-Speed 3399.88 samples/sec Loss 6.8905 LearningRate 0.0598 Epoch: 4 Global Step: 56360 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:35:36,900-Speed 3313.54 samples/sec Loss 6.8635 LearningRate 0.0598 Epoch: 4 Global Step: 56370 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:35:40,017-Speed 3285.79 samples/sec Loss 6.8160 LearningRate 0.0598 Epoch: 4 Global Step: 56380 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:35:43,080-Speed 3344.64 samples/sec Loss 6.8544 LearningRate 0.0598 Epoch: 4 Global Step: 56390 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:35:46,143-Speed 3344.84 samples/sec Loss 6.9341 LearningRate 0.0597 Epoch: 4 Global Step: 56400 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:35:49,178-Speed 3375.14 samples/sec Loss 6.8144 LearningRate 0.0597 Epoch: 4 Global Step: 56410 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:35:52,250-Speed 3333.81 samples/sec Loss 6.8996 LearningRate 0.0597 Epoch: 4 Global Step: 56420 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:35:55,322-Speed 3334.74 samples/sec Loss 6.8197 LearningRate 0.0597 Epoch: 4 Global Step: 56430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:35:58,342-Speed 3391.53 samples/sec Loss 6.7500 LearningRate 0.0597 Epoch: 4 Global Step: 56440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:01,378-Speed 3374.60 samples/sec Loss 6.8444 LearningRate 0.0597 Epoch: 4 Global Step: 56450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:04,419-Speed 3368.04 samples/sec Loss 6.9248 LearningRate 0.0597 Epoch: 4 Global Step: 56460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:07,456-Speed 3372.68 samples/sec Loss 6.8427 LearningRate 0.0597 Epoch: 4 Global Step: 56470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:10,598-Speed 3259.70 samples/sec Loss 6.8348 LearningRate 0.0597 Epoch: 4 Global Step: 56480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:13,642-Speed 3365.42 samples/sec Loss 6.8356 LearningRate 0.0597 Epoch: 4 Global Step: 56490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:16,691-Speed 3359.66 samples/sec Loss 6.8717 LearningRate 0.0597 Epoch: 4 Global Step: 56500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:19,732-Speed 3367.35 samples/sec Loss 6.9760 LearningRate 0.0597 Epoch: 4 Global Step: 56510 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:22,788-Speed 3352.42 samples/sec Loss 6.7945 LearningRate 0.0597 Epoch: 4 Global Step: 56520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:25,820-Speed 3378.20 samples/sec Loss 6.9099 LearningRate 0.0597 Epoch: 4 Global Step: 56530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:36:28,830-Speed 3403.52 samples/sec Loss 6.8928 LearningRate 0.0597 Epoch: 4 Global Step: 56540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:36:31,854-Speed 3386.73 samples/sec Loss 6.9169 LearningRate 0.0597 Epoch: 4 Global Step: 56550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:36:34,906-Speed 3356.61 samples/sec Loss 6.8044 LearningRate 0.0596 Epoch: 4 Global Step: 56560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:36:37,944-Speed 3371.24 samples/sec Loss 6.9329 LearningRate 0.0596 Epoch: 4 Global Step: 56570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:36:41,007-Speed 3344.48 samples/sec Loss 6.8740 LearningRate 0.0596 Epoch: 4 Global Step: 56580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:36:44,007-Speed 3414.63 samples/sec Loss 6.7588 LearningRate 0.0596 Epoch: 4 Global Step: 56590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:47,085-Speed 3328.06 samples/sec Loss 6.8257 LearningRate 0.0596 Epoch: 4 Global Step: 56600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:50,158-Speed 3333.05 samples/sec Loss 6.9156 LearningRate 0.0596 Epoch: 4 Global Step: 56610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:53,257-Speed 3305.26 samples/sec Loss 6.8031 LearningRate 0.0596 Epoch: 4 Global Step: 56620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:56,314-Speed 3351.20 samples/sec Loss 6.7380 LearningRate 0.0596 Epoch: 4 Global Step: 56630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:36:59,332-Speed 3393.31 samples/sec Loss 6.8816 LearningRate 0.0596 Epoch: 4 Global Step: 56640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:02,349-Speed 3395.18 samples/sec Loss 6.8021 LearningRate 0.0596 Epoch: 4 Global Step: 56650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:05,398-Speed 3360.23 samples/sec Loss 6.9154 LearningRate 0.0596 Epoch: 4 Global Step: 56660 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:08,426-Speed 3382.01 samples/sec Loss 6.9752 LearningRate 0.0596 Epoch: 4 Global Step: 56670 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:11,466-Speed 3369.85 samples/sec Loss 6.8030 LearningRate 0.0596 Epoch: 4 Global Step: 56680 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:14,498-Speed 3379.16 samples/sec Loss 6.8639 LearningRate 0.0596 Epoch: 4 Global Step: 56690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:37:17,589-Speed 3313.31 samples/sec Loss 6.8495 LearningRate 0.0596 Epoch: 4 Global Step: 56700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:37:20,633-Speed 3364.92 samples/sec Loss 6.8736 LearningRate 0.0596 Epoch: 4 Global Step: 56710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:37:23,640-Speed 3406.39 samples/sec Loss 6.8665 LearningRate 0.0595 Epoch: 4 Global Step: 56720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:37:26,657-Speed 3395.19 samples/sec Loss 6.8550 LearningRate 0.0595 Epoch: 4 Global Step: 56730 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:29,683-Speed 3385.08 samples/sec Loss 6.9461 LearningRate 0.0595 Epoch: 4 Global Step: 56740 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:32,678-Speed 3419.89 samples/sec Loss 6.8103 LearningRate 0.0595 Epoch: 4 Global Step: 56750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:35,687-Speed 3404.15 samples/sec Loss 6.7819 LearningRate 0.0595 Epoch: 4 Global Step: 56760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:38,792-Speed 3298.95 samples/sec Loss 6.9466 LearningRate 0.0595 Epoch: 4 Global Step: 56770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:41,858-Speed 3341.26 samples/sec Loss 6.8486 LearningRate 0.0595 Epoch: 4 Global Step: 56780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:44,875-Speed 3394.81 samples/sec Loss 6.9466 LearningRate 0.0595 Epoch: 4 Global Step: 56790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:47,913-Speed 3371.56 samples/sec Loss 6.8361 LearningRate 0.0595 Epoch: 4 Global Step: 56800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:50,953-Speed 3370.35 samples/sec Loss 7.0021 LearningRate 0.0595 Epoch: 4 Global Step: 56810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:53,992-Speed 3370.09 samples/sec Loss 6.9548 LearningRate 0.0595 Epoch: 4 Global Step: 56820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:37:57,013-Speed 3390.64 samples/sec Loss 6.9252 LearningRate 0.0595 Epoch: 4 Global Step: 56830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:38:00,062-Speed 3360.15 samples/sec Loss 6.8626 LearningRate 0.0595 Epoch: 4 Global Step: 56840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:03,122-Speed 3347.35 samples/sec Loss 6.9823 LearningRate 0.0595 Epoch: 4 Global Step: 56850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:06,144-Speed 3389.38 samples/sec Loss 6.7828 LearningRate 0.0595 Epoch: 4 Global Step: 56860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:09,141-Speed 3417.19 samples/sec Loss 6.8360 LearningRate 0.0595 Epoch: 4 Global Step: 56870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:12,165-Speed 3387.52 samples/sec Loss 6.9465 LearningRate 0.0594 Epoch: 4 Global Step: 56880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:15,189-Speed 3387.49 samples/sec Loss 7.0022 LearningRate 0.0594 Epoch: 4 Global Step: 56890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:18,299-Speed 3293.44 samples/sec Loss 6.7278 LearningRate 0.0594 Epoch: 4 Global Step: 56900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:21,300-Speed 3414.30 samples/sec Loss 6.9336 LearningRate 0.0594 Epoch: 4 Global Step: 56910 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:24,324-Speed 3387.23 samples/sec Loss 6.8985 LearningRate 0.0594 Epoch: 4 Global Step: 56920 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:27,332-Speed 3405.44 samples/sec Loss 6.8500 LearningRate 0.0594 Epoch: 4 Global Step: 56930 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:30,374-Speed 3366.93 samples/sec Loss 6.9627 LearningRate 0.0594 Epoch: 4 Global Step: 56940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:38:33,393-Speed 3393.47 samples/sec Loss 6.8556 LearningRate 0.0594 Epoch: 4 Global Step: 56950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:36,400-Speed 3405.69 samples/sec Loss 6.9445 LearningRate 0.0594 Epoch: 4 Global Step: 56960 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:39,477-Speed 3328.39 samples/sec Loss 6.8507 LearningRate 0.0594 Epoch: 4 Global Step: 56970 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:42,505-Speed 3383.62 samples/sec Loss 6.8479 LearningRate 0.0594 Epoch: 4 Global Step: 56980 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:45,509-Speed 3410.11 samples/sec Loss 6.8370 LearningRate 0.0594 Epoch: 4 Global Step: 56990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:48,538-Speed 3382.01 samples/sec Loss 6.8558 LearningRate 0.0594 Epoch: 4 Global Step: 57000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:51,582-Speed 3364.78 samples/sec Loss 6.9309 LearningRate 0.0594 Epoch: 4 Global Step: 57010 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:54,636-Speed 3354.93 samples/sec Loss 6.9882 LearningRate 0.0594 Epoch: 4 Global Step: 57020 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:38:57,637-Speed 3413.45 samples/sec Loss 7.0360 LearningRate 0.0594 Epoch: 4 Global Step: 57030 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:39:00,662-Speed 3385.77 samples/sec Loss 6.9467 LearningRate 0.0593 Epoch: 4 Global Step: 57040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:39:03,726-Speed 3343.40 samples/sec Loss 6.9056 LearningRate 0.0593 Epoch: 4 Global Step: 57050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:39:06,779-Speed 3354.68 samples/sec Loss 6.8194 LearningRate 0.0593 Epoch: 4 Global Step: 57060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:39:09,779-Speed 3415.26 samples/sec Loss 6.9165 LearningRate 0.0593 Epoch: 4 Global Step: 57070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:39:12,813-Speed 3375.10 samples/sec Loss 6.8669 LearningRate 0.0593 Epoch: 4 Global Step: 57080 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:39:15,900-Speed 3318.13 samples/sec Loss 6.9303 LearningRate 0.0593 Epoch: 4 Global Step: 57090 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:39:18,963-Speed 3344.41 samples/sec Loss 6.9495 LearningRate 0.0593 Epoch: 4 Global Step: 57100 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:39:21,982-Speed 3392.96 samples/sec Loss 6.7895 LearningRate 0.0593 Epoch: 4 Global Step: 57110 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:39:25,016-Speed 3376.23 samples/sec Loss 7.0131 LearningRate 0.0593 Epoch: 4 Global Step: 57120 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:39:28,088-Speed 3335.13 samples/sec Loss 6.9065 LearningRate 0.0593 Epoch: 4 Global Step: 57130 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:39:31,170-Speed 3323.04 samples/sec Loss 6.8802 LearningRate 0.0593 Epoch: 4 Global Step: 57140 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:39:34,190-Speed 3391.80 samples/sec Loss 7.0122 LearningRate 0.0593 Epoch: 4 Global Step: 57150 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:39:37,264-Speed 3332.02 samples/sec Loss 6.9166 LearningRate 0.0593 Epoch: 4 Global Step: 57160 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:39:40,314-Speed 3358.61 samples/sec Loss 6.9173 LearningRate 0.0593 Epoch: 4 Global Step: 57170 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:39:43,338-Speed 3387.65 samples/sec Loss 7.0148 LearningRate 0.0593 Epoch: 4 Global Step: 57180 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:39:46,348-Speed 3402.15 samples/sec Loss 6.9372 LearningRate 0.0593 Epoch: 4 Global Step: 57190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:39:49,376-Speed 3383.06 samples/sec Loss 6.9183 LearningRate 0.0593 Epoch: 4 Global Step: 57200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:39:52,464-Speed 3317.53 samples/sec Loss 6.8626 LearningRate 0.0592 Epoch: 4 Global Step: 57210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:39:55,576-Speed 3291.56 samples/sec Loss 6.8967 LearningRate 0.0592 Epoch: 4 Global Step: 57220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:39:58,599-Speed 3388.59 samples/sec Loss 6.9552 LearningRate 0.0592 Epoch: 4 Global Step: 57230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:40:01,705-Speed 3297.64 samples/sec Loss 6.8808 LearningRate 0.0592 Epoch: 4 Global Step: 57240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:40:04,757-Speed 3356.81 samples/sec Loss 6.9182 LearningRate 0.0592 Epoch: 4 Global Step: 57250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:40:07,777-Speed 3391.49 samples/sec Loss 6.9337 LearningRate 0.0592 Epoch: 4 Global Step: 57260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:40:10,799-Speed 3390.07 samples/sec Loss 7.0449 LearningRate 0.0592 Epoch: 4 Global Step: 57270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:40:13,801-Speed 3412.15 samples/sec Loss 6.8060 LearningRate 0.0592 Epoch: 4 Global Step: 57280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:40:16,868-Speed 3338.70 samples/sec Loss 6.9837 LearningRate 0.0592 Epoch: 4 Global Step: 57290 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:40:19,938-Speed 3336.61 samples/sec Loss 6.9635 LearningRate 0.0592 Epoch: 4 Global Step: 57300 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:40:22,929-Speed 3425.22 samples/sec Loss 6.7937 LearningRate 0.0592 Epoch: 4 Global Step: 57310 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:40:25,956-Speed 3384.11 samples/sec Loss 6.8213 LearningRate 0.0592 Epoch: 4 Global Step: 57320 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:40:29,005-Speed 3360.13 samples/sec Loss 6.8366 LearningRate 0.0592 Epoch: 4 Global Step: 57330 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:40:32,034-Speed 3382.03 samples/sec Loss 6.9293 LearningRate 0.0592 Epoch: 4 Global Step: 57340 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:40:35,067-Speed 3376.64 samples/sec Loss 6.8616 LearningRate 0.0592 Epoch: 4 Global Step: 57350 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:40:38,098-Speed 3379.90 samples/sec Loss 6.8543 LearningRate 0.0592 Epoch: 4 Global Step: 57360 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:40:41,189-Speed 3313.72 samples/sec Loss 6.7445 LearningRate 0.0591 Epoch: 4 Global Step: 57370 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:40:44,228-Speed 3370.08 samples/sec Loss 6.8440 LearningRate 0.0591 Epoch: 4 Global Step: 57380 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:40:47,265-Speed 3373.16 samples/sec Loss 6.9705 LearningRate 0.0591 Epoch: 4 Global Step: 57390 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:40:50,336-Speed 3335.13 samples/sec Loss 6.8303 LearningRate 0.0591 Epoch: 4 Global Step: 57400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:40:53,393-Speed 3351.56 samples/sec Loss 6.7961 LearningRate 0.0591 Epoch: 4 Global Step: 57410 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:40:56,412-Speed 3392.87 samples/sec Loss 6.8745 LearningRate 0.0591 Epoch: 4 Global Step: 57420 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:40:59,473-Speed 3346.00 samples/sec Loss 6.8051 LearningRate 0.0591 Epoch: 4 Global Step: 57430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:02,524-Speed 3357.07 samples/sec Loss 6.8703 LearningRate 0.0591 Epoch: 4 Global Step: 57440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:05,567-Speed 3366.87 samples/sec Loss 6.8938 LearningRate 0.0591 Epoch: 4 Global Step: 57450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:08,615-Speed 3360.20 samples/sec Loss 6.8587 LearningRate 0.0591 Epoch: 4 Global Step: 57460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:11,718-Speed 3301.22 samples/sec Loss 6.8307 LearningRate 0.0591 Epoch: 4 Global Step: 57470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:14,833-Speed 3288.76 samples/sec Loss 6.8493 LearningRate 0.0591 Epoch: 4 Global Step: 57480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:17,866-Speed 3376.27 samples/sec Loss 6.8251 LearningRate 0.0591 Epoch: 4 Global Step: 57490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:41:20,903-Speed 3372.99 samples/sec Loss 6.8614 LearningRate 0.0591 Epoch: 4 Global Step: 57500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:41:23,960-Speed 3351.56 samples/sec Loss 6.8651 LearningRate 0.0591 Epoch: 4 Global Step: 57510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:41:27,029-Speed 3337.42 samples/sec Loss 7.0051 LearningRate 0.0591 Epoch: 4 Global Step: 57520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:30,077-Speed 3360.39 samples/sec Loss 6.8821 LearningRate 0.0590 Epoch: 4 Global Step: 57530 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:33,094-Speed 3394.99 samples/sec Loss 6.9618 LearningRate 0.0590 Epoch: 4 Global Step: 57540 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:36,115-Speed 3391.50 samples/sec Loss 7.0310 LearningRate 0.0590 Epoch: 4 Global Step: 57550 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:39,135-Speed 3391.38 samples/sec Loss 6.8843 LearningRate 0.0590 Epoch: 4 Global Step: 57560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:42,142-Speed 3407.17 samples/sec Loss 6.8917 LearningRate 0.0590 Epoch: 4 Global Step: 57570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:45,154-Speed 3400.76 samples/sec Loss 6.9679 LearningRate 0.0590 Epoch: 4 Global Step: 57580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:48,266-Speed 3291.91 samples/sec Loss 6.9818 LearningRate 0.0590 Epoch: 4 Global Step: 57590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:51,328-Speed 3344.42 samples/sec Loss 6.9053 LearningRate 0.0590 Epoch: 4 Global Step: 57600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:54,356-Speed 3383.29 samples/sec Loss 6.9377 LearningRate 0.0590 Epoch: 4 Global Step: 57610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:41:57,393-Speed 3372.60 samples/sec Loss 6.9953 LearningRate 0.0590 Epoch: 4 Global Step: 57620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:42:00,424-Speed 3379.98 samples/sec Loss 6.9316 LearningRate 0.0590 Epoch: 4 Global Step: 57630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:42:03,524-Speed 3303.88 samples/sec Loss 6.8980 LearningRate 0.0590 Epoch: 4 Global Step: 57640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:06,605-Speed 3324.60 samples/sec Loss 6.8541 LearningRate 0.0590 Epoch: 4 Global Step: 57650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:09,621-Speed 3396.52 samples/sec Loss 6.8734 LearningRate 0.0590 Epoch: 4 Global Step: 57660 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:12,627-Speed 3407.65 samples/sec Loss 6.8648 LearningRate 0.0590 Epoch: 4 Global Step: 57670 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:15,646-Speed 3393.25 samples/sec Loss 6.8789 LearningRate 0.0590 Epoch: 4 Global Step: 57680 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:18,670-Speed 3386.87 samples/sec Loss 6.8303 LearningRate 0.0589 Epoch: 4 Global Step: 57690 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:21,691-Speed 3391.04 samples/sec Loss 6.8687 LearningRate 0.0589 Epoch: 4 Global Step: 57700 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:24,777-Speed 3319.05 samples/sec Loss 6.9011 LearningRate 0.0589 Epoch: 4 Global Step: 57710 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:27,812-Speed 3375.46 samples/sec Loss 7.0172 LearningRate 0.0589 Epoch: 4 Global Step: 57720 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:30,822-Speed 3403.16 samples/sec Loss 6.8700 LearningRate 0.0589 Epoch: 4 Global Step: 57730 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:33,869-Speed 3361.79 samples/sec Loss 6.8388 LearningRate 0.0589 Epoch: 4 Global Step: 57740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:42:36,933-Speed 3343.81 samples/sec Loss 6.9067 LearningRate 0.0589 Epoch: 4 Global Step: 57750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:39,962-Speed 3381.41 samples/sec Loss 6.8337 LearningRate 0.0589 Epoch: 4 Global Step: 57760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:43,034-Speed 3334.24 samples/sec Loss 6.8626 LearningRate 0.0589 Epoch: 4 Global Step: 57770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:46,049-Speed 3397.51 samples/sec Loss 6.9326 LearningRate 0.0589 Epoch: 4 Global Step: 57780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:49,070-Speed 3390.34 samples/sec Loss 6.9153 LearningRate 0.0589 Epoch: 4 Global Step: 57790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:52,188-Speed 3285.49 samples/sec Loss 6.9339 LearningRate 0.0589 Epoch: 4 Global Step: 57800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:55,242-Speed 3353.99 samples/sec Loss 6.8277 LearningRate 0.0589 Epoch: 4 Global Step: 57810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:42:58,254-Speed 3401.51 samples/sec Loss 6.8734 LearningRate 0.0589 Epoch: 4 Global Step: 57820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:01,256-Speed 3412.06 samples/sec Loss 6.9221 LearningRate 0.0589 Epoch: 4 Global Step: 57830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:04,339-Speed 3321.90 samples/sec Loss 6.8575 LearningRate 0.0589 Epoch: 4 Global Step: 57840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:07,349-Speed 3403.66 samples/sec Loss 6.8733 LearningRate 0.0588 Epoch: 4 Global Step: 57850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:10,362-Speed 3399.50 samples/sec Loss 7.0689 LearningRate 0.0588 Epoch: 4 Global Step: 57860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:13,420-Speed 3349.66 samples/sec Loss 6.9681 LearningRate 0.0588 Epoch: 4 Global Step: 57870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:16,527-Speed 3296.90 samples/sec Loss 6.9002 LearningRate 0.0588 Epoch: 4 Global Step: 57880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:19,612-Speed 3319.88 samples/sec Loss 6.9249 LearningRate 0.0588 Epoch: 4 Global Step: 57890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:22,625-Speed 3399.24 samples/sec Loss 6.9478 LearningRate 0.0588 Epoch: 4 Global Step: 57900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:25,669-Speed 3365.39 samples/sec Loss 6.9117 LearningRate 0.0588 Epoch: 4 Global Step: 57910 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:28,748-Speed 3327.58 samples/sec Loss 6.8018 LearningRate 0.0588 Epoch: 4 Global Step: 57920 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:31,766-Speed 3393.57 samples/sec Loss 6.9311 LearningRate 0.0588 Epoch: 4 Global Step: 57930 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:34,825-Speed 3348.92 samples/sec Loss 6.7437 LearningRate 0.0588 Epoch: 4 Global Step: 57940 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:37,902-Speed 3328.94 samples/sec Loss 6.9300 LearningRate 0.0588 Epoch: 4 Global Step: 57950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:43:40,973-Speed 3335.53 samples/sec Loss 6.8969 LearningRate 0.0588 Epoch: 4 Global Step: 57960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:43:43,980-Speed 3406.27 samples/sec Loss 6.9192 LearningRate 0.0588 Epoch: 4 Global Step: 57970 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:47,024-Speed 3365.58 samples/sec Loss 7.0629 LearningRate 0.0588 Epoch: 4 Global Step: 57980 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:50,136-Speed 3290.83 samples/sec Loss 7.0359 LearningRate 0.0588 Epoch: 4 Global Step: 57990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:53,182-Speed 3363.80 samples/sec Loss 6.9456 LearningRate 0.0588 Epoch: 4 Global Step: 58000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:56,190-Speed 3405.40 samples/sec Loss 6.9095 LearningRate 0.0587 Epoch: 4 Global Step: 58010 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:43:59,224-Speed 3376.31 samples/sec Loss 6.8406 LearningRate 0.0587 Epoch: 4 Global Step: 58020 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:02,302-Speed 3327.89 samples/sec Loss 6.9076 LearningRate 0.0587 Epoch: 4 Global Step: 58030 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:05,366-Speed 3342.78 samples/sec Loss 6.9638 LearningRate 0.0587 Epoch: 4 Global Step: 58040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:08,434-Speed 3337.82 samples/sec Loss 7.0027 LearningRate 0.0587 Epoch: 4 Global Step: 58050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:11,487-Speed 3356.13 samples/sec Loss 6.9123 LearningRate 0.0587 Epoch: 4 Global Step: 58060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:14,538-Speed 3357.02 samples/sec Loss 6.7888 LearningRate 0.0587 Epoch: 4 Global Step: 58070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:44:17,579-Speed 3368.71 samples/sec Loss 6.9489 LearningRate 0.0587 Epoch: 4 Global Step: 58080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:44:20,590-Speed 3402.49 samples/sec Loss 6.9075 LearningRate 0.0587 Epoch: 4 Global Step: 58090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:44:23,594-Speed 3410.07 samples/sec Loss 6.8639 LearningRate 0.0587 Epoch: 4 Global Step: 58100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:44:26,648-Speed 3353.94 samples/sec Loss 6.7663 LearningRate 0.0587 Epoch: 4 Global Step: 58110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:44:29,658-Speed 3403.14 samples/sec Loss 6.9077 LearningRate 0.0587 Epoch: 4 Global Step: 58120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:32,726-Speed 3339.09 samples/sec Loss 6.8626 LearningRate 0.0587 Epoch: 4 Global Step: 58130 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:35,752-Speed 3384.92 samples/sec Loss 6.9164 LearningRate 0.0587 Epoch: 4 Global Step: 58140 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:38,803-Speed 3357.65 samples/sec Loss 6.8805 LearningRate 0.0587 Epoch: 4 Global Step: 58150 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:41,913-Speed 3293.09 samples/sec Loss 6.8231 LearningRate 0.0587 Epoch: 4 Global Step: 58160 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:44,994-Speed 3324.65 samples/sec Loss 6.9436 LearningRate 0.0587 Epoch: 4 Global Step: 58170 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:48,059-Speed 3342.43 samples/sec Loss 6.8388 LearningRate 0.0586 Epoch: 4 Global Step: 58180 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:51,067-Speed 3405.85 samples/sec Loss 6.9063 LearningRate 0.0586 Epoch: 4 Global Step: 58190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:54,144-Speed 3329.13 samples/sec Loss 6.8784 LearningRate 0.0586 Epoch: 4 Global Step: 58200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:44:57,142-Speed 3416.11 samples/sec Loss 6.9389 LearningRate 0.0586 Epoch: 4 Global Step: 58210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:00,190-Speed 3360.73 samples/sec Loss 6.8482 LearningRate 0.0586 Epoch: 4 Global Step: 58220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:45:03,263-Speed 3334.08 samples/sec Loss 6.8117 LearningRate 0.0586 Epoch: 4 Global Step: 58230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:45:06,294-Speed 3379.62 samples/sec Loss 6.8646 LearningRate 0.0586 Epoch: 4 Global Step: 58240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:09,321-Speed 3383.13 samples/sec Loss 6.8642 LearningRate 0.0586 Epoch: 4 Global Step: 58250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:12,396-Speed 3331.89 samples/sec Loss 6.8912 LearningRate 0.0586 Epoch: 4 Global Step: 58260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:15,434-Speed 3371.88 samples/sec Loss 6.8293 LearningRate 0.0586 Epoch: 4 Global Step: 58270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:18,456-Speed 3388.50 samples/sec Loss 6.8599 LearningRate 0.0586 Epoch: 4 Global Step: 58280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:21,484-Speed 3383.64 samples/sec Loss 6.9423 LearningRate 0.0586 Epoch: 4 Global Step: 58290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:24,504-Speed 3391.80 samples/sec Loss 6.7795 LearningRate 0.0586 Epoch: 4 Global Step: 58300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:27,513-Speed 3403.98 samples/sec Loss 6.8491 LearningRate 0.0586 Epoch: 4 Global Step: 58310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:30,624-Speed 3292.52 samples/sec Loss 6.8058 LearningRate 0.0586 Epoch: 4 Global Step: 58320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:33,661-Speed 3373.22 samples/sec Loss 6.9566 LearningRate 0.0586 Epoch: 4 Global Step: 58330 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:36,728-Speed 3339.51 samples/sec Loss 6.7720 LearningRate 0.0585 Epoch: 4 Global Step: 58340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:45:39,834-Speed 3297.76 samples/sec Loss 6.8309 LearningRate 0.0585 Epoch: 4 Global Step: 58350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:45:42,916-Speed 3323.21 samples/sec Loss 6.7664 LearningRate 0.0585 Epoch: 4 Global Step: 58360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:45:45,914-Speed 3416.52 samples/sec Loss 6.9046 LearningRate 0.0585 Epoch: 4 Global Step: 58370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:45:48,990-Speed 3330.53 samples/sec Loss 6.9057 LearningRate 0.0585 Epoch: 4 Global Step: 58380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:45:52,006-Speed 3396.18 samples/sec Loss 6.9372 LearningRate 0.0585 Epoch: 4 Global Step: 58390 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:55,048-Speed 3367.81 samples/sec Loss 6.9416 LearningRate 0.0585 Epoch: 4 Global Step: 58400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:45:58,061-Speed 3399.62 samples/sec Loss 6.8209 LearningRate 0.0585 Epoch: 4 Global Step: 58410 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:01,088-Speed 3384.69 samples/sec Loss 6.7575 LearningRate 0.0585 Epoch: 4 Global Step: 58420 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:04,193-Speed 3297.96 samples/sec Loss 6.8543 LearningRate 0.0585 Epoch: 4 Global Step: 58430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:07,288-Speed 3309.90 samples/sec Loss 6.9249 LearningRate 0.0585 Epoch: 4 Global Step: 58440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:10,310-Speed 3390.03 samples/sec Loss 6.8382 LearningRate 0.0585 Epoch: 4 Global Step: 58450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:13,429-Speed 3283.38 samples/sec Loss 6.9891 LearningRate 0.0585 Epoch: 4 Global Step: 58460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:16,539-Speed 3293.99 samples/sec Loss 6.9140 LearningRate 0.0585 Epoch: 4 Global Step: 58470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:19,568-Speed 3382.13 samples/sec Loss 6.8529 LearningRate 0.0585 Epoch: 4 Global Step: 58480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:22,586-Speed 3393.17 samples/sec Loss 6.7759 LearningRate 0.0585 Epoch: 4 Global Step: 58490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:46:25,671-Speed 3320.75 samples/sec Loss 6.7626 LearningRate 0.0584 Epoch: 4 Global Step: 58500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:46:28,739-Speed 3339.40 samples/sec Loss 6.8882 LearningRate 0.0584 Epoch: 4 Global Step: 58510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:46:31,787-Speed 3359.81 samples/sec Loss 7.0252 LearningRate 0.0584 Epoch: 4 Global Step: 58520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:34,843-Speed 3352.35 samples/sec Loss 6.7675 LearningRate 0.0584 Epoch: 4 Global Step: 58530 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:37,880-Speed 3372.76 samples/sec Loss 6.9054 LearningRate 0.0584 Epoch: 4 Global Step: 58540 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:40,900-Speed 3391.86 samples/sec Loss 6.8411 LearningRate 0.0584 Epoch: 4 Global Step: 58550 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:43,941-Speed 3368.02 samples/sec Loss 6.8144 LearningRate 0.0584 Epoch: 4 Global Step: 58560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:46,992-Speed 3357.82 samples/sec Loss 6.9970 LearningRate 0.0584 Epoch: 4 Global Step: 58570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:50,069-Speed 3328.60 samples/sec Loss 6.8481 LearningRate 0.0584 Epoch: 4 Global Step: 58580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:53,151-Speed 3323.69 samples/sec Loss 6.8006 LearningRate 0.0584 Epoch: 4 Global Step: 58590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:56,252-Speed 3303.94 samples/sec Loss 6.9491 LearningRate 0.0584 Epoch: 4 Global Step: 58600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:46:59,303-Speed 3357.15 samples/sec Loss 6.8560 LearningRate 0.0584 Epoch: 4 Global Step: 58610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:47:02,367-Speed 3342.25 samples/sec Loss 6.9405 LearningRate 0.0584 Epoch: 4 Global Step: 58620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:47:05,432-Speed 3343.27 samples/sec Loss 6.9244 LearningRate 0.0584 Epoch: 4 Global Step: 58630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:47:08,465-Speed 3376.80 samples/sec Loss 6.8990 LearningRate 0.0584 Epoch: 4 Global Step: 58640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:47:11,495-Speed 3380.25 samples/sec Loss 6.7174 LearningRate 0.0584 Epoch: 4 Global Step: 58650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:47:14,577-Speed 3324.16 samples/sec Loss 6.8047 LearningRate 0.0583 Epoch: 4 Global Step: 58660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:47:17,636-Speed 3347.77 samples/sec Loss 6.8142 LearningRate 0.0583 Epoch: 4 Global Step: 58670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:47:20,669-Speed 3377.37 samples/sec Loss 6.9884 LearningRate 0.0583 Epoch: 4 Global Step: 58680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:47:23,698-Speed 3381.96 samples/sec Loss 6.8452 LearningRate 0.0583 Epoch: 4 Global Step: 58690 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:47:26,767-Speed 3337.98 samples/sec Loss 6.8291 LearningRate 0.0583 Epoch: 4 Global Step: 58700 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:47:29,842-Speed 3331.24 samples/sec Loss 6.8652 LearningRate 0.0583 Epoch: 4 Global Step: 58710 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:47:32,888-Speed 3362.30 samples/sec Loss 6.8208 LearningRate 0.0583 Epoch: 4 Global Step: 58720 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:47:35,939-Speed 3357.62 samples/sec Loss 6.7956 LearningRate 0.0583 Epoch: 4 Global Step: 58730 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:47:38,989-Speed 3359.44 samples/sec Loss 6.7771 LearningRate 0.0583 Epoch: 4 Global Step: 58740 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:47:42,079-Speed 3314.62 samples/sec Loss 6.8061 LearningRate 0.0583 Epoch: 4 Global Step: 58750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:47:45,121-Speed 3366.69 samples/sec Loss 6.8241 LearningRate 0.0583 Epoch: 4 Global Step: 58760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:47:48,187-Speed 3341.33 samples/sec Loss 6.8172 LearningRate 0.0583 Epoch: 4 Global Step: 58770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:47:51,215-Speed 3382.75 samples/sec Loss 6.8542 LearningRate 0.0583 Epoch: 4 Global Step: 58780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:47:54,248-Speed 3376.95 samples/sec Loss 7.0127 LearningRate 0.0583 Epoch: 4 Global Step: 58790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:47:57,295-Speed 3363.00 samples/sec Loss 6.7668 LearningRate 0.0583 Epoch: 4 Global Step: 58800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:48:00,398-Speed 3300.67 samples/sec Loss 6.8686 LearningRate 0.0583 Epoch: 4 Global Step: 58810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:03,436-Speed 3371.00 samples/sec Loss 6.9717 LearningRate 0.0583 Epoch: 4 Global Step: 58820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:06,489-Speed 3355.69 samples/sec Loss 6.9044 LearningRate 0.0582 Epoch: 4 Global Step: 58830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:09,526-Speed 3373.01 samples/sec Loss 7.1049 LearningRate 0.0582 Epoch: 4 Global Step: 58840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:12,566-Speed 3368.71 samples/sec Loss 6.7485 LearningRate 0.0582 Epoch: 4 Global Step: 58850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:15,619-Speed 3356.27 samples/sec Loss 6.7009 LearningRate 0.0582 Epoch: 4 Global Step: 58860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:18,688-Speed 3337.56 samples/sec Loss 6.8327 LearningRate 0.0582 Epoch: 4 Global Step: 58870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:21,688-Speed 3413.53 samples/sec Loss 6.8425 LearningRate 0.0582 Epoch: 4 Global Step: 58880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:24,706-Speed 3395.21 samples/sec Loss 6.8608 LearningRate 0.0582 Epoch: 4 Global Step: 58890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:27,729-Speed 3387.98 samples/sec Loss 6.8790 LearningRate 0.0582 Epoch: 4 Global Step: 58900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:30,767-Speed 3371.52 samples/sec Loss 6.7943 LearningRate 0.0582 Epoch: 4 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:48:33,803-Speed 3374.16 samples/sec Loss 6.8115 LearningRate 0.0582 Epoch: 4 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:48:36,821-Speed 3394.15 samples/sec Loss 6.9341 LearningRate 0.0582 Epoch: 4 Global Step: 58930 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:39,867-Speed 3363.16 samples/sec Loss 6.9132 LearningRate 0.0582 Epoch: 4 Global Step: 58940 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:42,979-Speed 3291.61 samples/sec Loss 6.8710 LearningRate 0.0582 Epoch: 4 Global Step: 58950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:46,016-Speed 3372.74 samples/sec Loss 6.9318 LearningRate 0.0582 Epoch: 4 Global Step: 58960 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:49,068-Speed 3355.52 samples/sec Loss 6.8760 LearningRate 0.0582 Epoch: 4 Global Step: 58970 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:52,143-Speed 3331.41 samples/sec Loss 6.8281 LearningRate 0.0582 Epoch: 4 Global Step: 58980 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:55,181-Speed 3372.00 samples/sec Loss 6.8749 LearningRate 0.0581 Epoch: 4 Global Step: 58990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:48:58,217-Speed 3374.30 samples/sec Loss 6.8892 LearningRate 0.0581 Epoch: 4 Global Step: 59000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:01,283-Speed 3339.74 samples/sec Loss 6.8914 LearningRate 0.0581 Epoch: 4 Global Step: 59010 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:04,337-Speed 3354.14 samples/sec Loss 6.7380 LearningRate 0.0581 Epoch: 4 Global Step: 59020 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:07,352-Speed 3398.18 samples/sec Loss 6.8460 LearningRate 0.0581 Epoch: 4 Global Step: 59030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:49:10,374-Speed 3390.01 samples/sec Loss 6.8912 LearningRate 0.0581 Epoch: 4 Global Step: 59040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:49:13,485-Speed 3292.41 samples/sec Loss 6.8332 LearningRate 0.0581 Epoch: 4 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:49:16,521-Speed 3373.67 samples/sec Loss 6.8169 LearningRate 0.0581 Epoch: 4 Global Step: 59060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:49:19,549-Speed 3383.06 samples/sec Loss 6.9359 LearningRate 0.0581 Epoch: 4 Global Step: 59070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:22,567-Speed 3394.41 samples/sec Loss 6.8918 LearningRate 0.0581 Epoch: 4 Global Step: 59080 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:25,599-Speed 3378.15 samples/sec Loss 6.9524 LearningRate 0.0581 Epoch: 4 Global Step: 59090 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:28,679-Speed 3325.76 samples/sec Loss 6.9131 LearningRate 0.0581 Epoch: 4 Global Step: 59100 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:31,761-Speed 3323.38 samples/sec Loss 6.8469 LearningRate 0.0581 Epoch: 4 Global Step: 59110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:34,821-Speed 3347.98 samples/sec Loss 6.7803 LearningRate 0.0581 Epoch: 4 Global Step: 59120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:37,871-Speed 3358.60 samples/sec Loss 6.8157 LearningRate 0.0581 Epoch: 4 Global Step: 59130 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:40,925-Speed 3354.26 samples/sec Loss 6.8206 LearningRate 0.0581 Epoch: 4 Global Step: 59140 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:43,958-Speed 3377.71 samples/sec Loss 6.7974 LearningRate 0.0580 Epoch: 4 Global Step: 59150 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:46,962-Speed 3409.47 samples/sec Loss 6.9391 LearningRate 0.0580 Epoch: 4 Global Step: 59160 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:49,971-Speed 3404.44 samples/sec Loss 6.9036 LearningRate 0.0580 Epoch: 4 Global Step: 59170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:49:53,040-Speed 3337.31 samples/sec Loss 6.8421 LearningRate 0.0580 Epoch: 4 Global Step: 59180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:49:56,085-Speed 3363.95 samples/sec Loss 6.8341 LearningRate 0.0580 Epoch: 4 Global Step: 59190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:49:59,097-Speed 3400.80 samples/sec Loss 6.9249 LearningRate 0.0580 Epoch: 4 Global Step: 59200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:50:02,157-Speed 3347.40 samples/sec Loss 7.0139 LearningRate 0.0580 Epoch: 4 Global Step: 59210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:50:05,221-Speed 3343.18 samples/sec Loss 6.7666 LearningRate 0.0580 Epoch: 4 Global Step: 59220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:50:08,291-Speed 3336.65 samples/sec Loss 6.9209 LearningRate 0.0580 Epoch: 4 Global Step: 59230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:50:11,359-Speed 3338.65 samples/sec Loss 6.9376 LearningRate 0.0580 Epoch: 4 Global Step: 59240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:50:14,405-Speed 3363.57 samples/sec Loss 6.7687 LearningRate 0.0580 Epoch: 4 Global Step: 59250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:50:17,483-Speed 3327.15 samples/sec Loss 6.8917 LearningRate 0.0580 Epoch: 4 Global Step: 59260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:50:20,503-Speed 3391.79 samples/sec Loss 6.6992 LearningRate 0.0580 Epoch: 4 Global Step: 59270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:50:23,550-Speed 3362.15 samples/sec Loss 6.9091 LearningRate 0.0580 Epoch: 4 Global Step: 59280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:50:26,629-Speed 3326.99 samples/sec Loss 6.9106 LearningRate 0.0580 Epoch: 4 Global Step: 59290 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:50:29,688-Speed 3347.64 samples/sec Loss 6.9071 LearningRate 0.0580 Epoch: 4 Global Step: 59300 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:50:32,725-Speed 3372.98 samples/sec Loss 6.8392 LearningRate 0.0580 Epoch: 4 Global Step: 59310 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:50:35,764-Speed 3371.10 samples/sec Loss 6.8121 LearningRate 0.0579 Epoch: 4 Global Step: 59320 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:50:38,822-Speed 3349.37 samples/sec Loss 6.8036 LearningRate 0.0579 Epoch: 4 Global Step: 59330 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:50:41,869-Speed 3361.71 samples/sec Loss 6.8684 LearningRate 0.0579 Epoch: 4 Global Step: 59340 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:50:44,870-Speed 3413.30 samples/sec Loss 6.7599 LearningRate 0.0579 Epoch: 4 Global Step: 59350 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:50:47,896-Speed 3385.82 samples/sec Loss 6.9887 LearningRate 0.0579 Epoch: 4 Global Step: 59360 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:50:50,971-Speed 3330.93 samples/sec Loss 6.8492 LearningRate 0.0579 Epoch: 4 Global Step: 59370 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:50:54,002-Speed 3379.59 samples/sec Loss 6.8652 LearningRate 0.0579 Epoch: 4 Global Step: 59380 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:50:57,056-Speed 3354.89 samples/sec Loss 6.8601 LearningRate 0.0579 Epoch: 4 Global Step: 59390 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:51:00,092-Speed 3373.47 samples/sec Loss 6.9178 LearningRate 0.0579 Epoch: 4 Global Step: 59400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:51:03,181-Speed 3315.62 samples/sec Loss 6.8452 LearningRate 0.0579 Epoch: 4 Global Step: 59410 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:51:06,245-Speed 3343.67 samples/sec Loss 6.6904 LearningRate 0.0579 Epoch: 4 Global Step: 59420 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:51:09,247-Speed 3411.60 samples/sec Loss 6.9014 LearningRate 0.0579 Epoch: 4 Global Step: 59430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:51:12,327-Speed 3326.36 samples/sec Loss 6.8769 LearningRate 0.0579 Epoch: 4 Global Step: 59440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:51:15,337-Speed 3403.69 samples/sec Loss 6.9153 LearningRate 0.0579 Epoch: 4 Global Step: 59450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:51:18,355-Speed 3393.41 samples/sec Loss 6.8318 LearningRate 0.0579 Epoch: 4 Global Step: 59460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:51:21,373-Speed 3393.47 samples/sec Loss 6.9567 LearningRate 0.0579 Epoch: 4 Global Step: 59470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:51:24,420-Speed 3362.78 samples/sec Loss 6.7377 LearningRate 0.0578 Epoch: 4 Global Step: 59480 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:51:27,518-Speed 3306.22 samples/sec Loss 6.8182 LearningRate 0.0578 Epoch: 4 Global Step: 59490 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:51:30,651-Speed 3269.37 samples/sec Loss 6.9027 LearningRate 0.0578 Epoch: 4 Global Step: 59500 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:51:33,670-Speed 3392.61 samples/sec Loss 6.8859 LearningRate 0.0578 Epoch: 4 Global Step: 59510 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:51:36,741-Speed 3335.91 samples/sec Loss 6.8879 LearningRate 0.0578 Epoch: 4 Global Step: 59520 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:51:39,845-Speed 3299.63 samples/sec Loss 6.7708 LearningRate 0.0578 Epoch: 4 Global Step: 59530 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:51:42,894-Speed 3359.90 samples/sec Loss 6.8392 LearningRate 0.0578 Epoch: 4 Global Step: 59540 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:51:45,941-Speed 3362.04 samples/sec Loss 6.8645 LearningRate 0.0578 Epoch: 4 Global Step: 59550 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:51:49,084-Speed 3258.33 samples/sec Loss 6.8177 LearningRate 0.0578 Epoch: 4 Global Step: 59560 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:51:52,220-Speed 3266.67 samples/sec Loss 6.9525 LearningRate 0.0578 Epoch: 4 Global Step: 59570 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:51:55,297-Speed 3329.56 samples/sec Loss 6.8972 LearningRate 0.0578 Epoch: 4 Global Step: 59580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:51:58,304-Speed 3406.01 samples/sec Loss 6.8710 LearningRate 0.0578 Epoch: 4 Global Step: 59590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:52:01,328-Speed 3387.67 samples/sec Loss 6.8373 LearningRate 0.0578 Epoch: 4 Global Step: 59600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:52:04,347-Speed 3392.49 samples/sec Loss 6.7760 LearningRate 0.0578 Epoch: 4 Global Step: 59610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:52:07,444-Speed 3307.57 samples/sec Loss 6.8407 LearningRate 0.0578 Epoch: 4 Global Step: 59620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:52:10,465-Speed 3390.48 samples/sec Loss 6.8014 LearningRate 0.0578 Epoch: 4 Global Step: 59630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:52:13,512-Speed 3362.04 samples/sec Loss 6.8122 LearningRate 0.0577 Epoch: 4 Global Step: 59640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:52:16,552-Speed 3370.34 samples/sec Loss 6.9889 LearningRate 0.0577 Epoch: 4 Global Step: 59650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:52:19,592-Speed 3368.91 samples/sec Loss 7.0050 LearningRate 0.0577 Epoch: 4 Global Step: 59660 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:52:22,602-Speed 3402.86 samples/sec Loss 6.9689 LearningRate 0.0577 Epoch: 4 Global Step: 59670 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:52:25,624-Speed 3390.06 samples/sec Loss 6.8334 LearningRate 0.0577 Epoch: 4 Global Step: 59680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:52:28,629-Speed 3409.02 samples/sec Loss 6.7783 LearningRate 0.0577 Epoch: 4 Global Step: 59690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:52:31,634-Speed 3407.85 samples/sec Loss 6.8621 LearningRate 0.0577 Epoch: 4 Global Step: 59700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:52:34,660-Speed 3386.05 samples/sec Loss 6.8378 LearningRate 0.0577 Epoch: 4 Global Step: 59710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:52:37,696-Speed 3373.95 samples/sec Loss 6.9185 LearningRate 0.0577 Epoch: 4 Global Step: 59720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:52:40,802-Speed 3297.45 samples/sec Loss 6.8172 LearningRate 0.0577 Epoch: 4 Global Step: 59730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:52:43,822-Speed 3391.66 samples/sec Loss 6.8690 LearningRate 0.0577 Epoch: 4 Global Step: 59740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:52:46,905-Speed 3322.25 samples/sec Loss 6.8828 LearningRate 0.0577 Epoch: 4 Global Step: 59750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:52:49,986-Speed 3324.67 samples/sec Loss 6.9097 LearningRate 0.0577 Epoch: 4 Global Step: 59760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:52:53,102-Speed 3287.86 samples/sec Loss 6.9856 LearningRate 0.0577 Epoch: 4 Global Step: 59770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:52:56,127-Speed 3385.43 samples/sec Loss 6.8432 LearningRate 0.0577 Epoch: 4 Global Step: 59780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 06:52:59,173-Speed 3363.56 samples/sec Loss 6.9011 LearningRate 0.0577 Epoch: 4 Global Step: 59790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 06:53:02,205-Speed 3378.66 samples/sec Loss 6.8600 LearningRate 0.0577 Epoch: 4 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:53:05,239-Speed 3376.11 samples/sec Loss 7.0104 LearningRate 0.0576 Epoch: 4 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:53:08,257-Speed 3394.56 samples/sec Loss 6.8879 LearningRate 0.0576 Epoch: 4 Global Step: 59820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:11,309-Speed 3355.19 samples/sec Loss 6.8585 LearningRate 0.0576 Epoch: 4 Global Step: 59830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:14,344-Speed 3375.78 samples/sec Loss 6.7660 LearningRate 0.0576 Epoch: 4 Global Step: 59840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:17,342-Speed 3417.08 samples/sec Loss 6.8231 LearningRate 0.0576 Epoch: 4 Global Step: 59850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:20,410-Speed 3338.74 samples/sec Loss 6.8316 LearningRate 0.0576 Epoch: 4 Global Step: 59860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:23,466-Speed 3352.13 samples/sec Loss 6.8923 LearningRate 0.0576 Epoch: 4 Global Step: 59870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:26,549-Speed 3322.22 samples/sec Loss 6.8362 LearningRate 0.0576 Epoch: 4 Global Step: 59880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:29,683-Speed 3267.95 samples/sec Loss 6.9188 LearningRate 0.0576 Epoch: 4 Global Step: 59890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:32,711-Speed 3383.18 samples/sec Loss 6.8603 LearningRate 0.0576 Epoch: 4 Global Step: 59900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:35,819-Speed 3295.85 samples/sec Loss 6.8855 LearningRate 0.0576 Epoch: 4 Global Step: 59910 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:38,858-Speed 3370.84 samples/sec Loss 6.8129 LearningRate 0.0576 Epoch: 4 Global Step: 59920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:53:41,930-Speed 3334.20 samples/sec Loss 6.8276 LearningRate 0.0576 Epoch: 4 Global Step: 59930 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:45,012-Speed 3323.77 samples/sec Loss 6.8649 LearningRate 0.0576 Epoch: 4 Global Step: 59940 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:48,066-Speed 3354.48 samples/sec Loss 6.8882 LearningRate 0.0576 Epoch: 4 Global Step: 59950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:51,170-Speed 3298.89 samples/sec Loss 6.8433 LearningRate 0.0576 Epoch: 4 Global Step: 59960 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:54,212-Speed 3367.20 samples/sec Loss 6.9045 LearningRate 0.0575 Epoch: 4 Global Step: 59970 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:53:57,238-Speed 3385.34 samples/sec Loss 6.8157 LearningRate 0.0575 Epoch: 4 Global Step: 59980 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:00,275-Speed 3373.76 samples/sec Loss 6.8342 LearningRate 0.0575 Epoch: 4 Global Step: 59990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:03,400-Speed 3277.28 samples/sec Loss 6.8400 LearningRate 0.0575 Epoch: 4 Global Step: 60000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:06,498-Speed 3306.66 samples/sec Loss 6.7646 LearningRate 0.0575 Epoch: 4 Global Step: 60010 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:09,533-Speed 3374.85 samples/sec Loss 6.8878 LearningRate 0.0575 Epoch: 4 Global Step: 60020 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:12,581-Speed 3360.17 samples/sec Loss 6.9313 LearningRate 0.0575 Epoch: 4 Global Step: 60030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:54:15,663-Speed 3324.53 samples/sec Loss 6.7549 LearningRate 0.0575 Epoch: 4 Global Step: 60040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:54:18,711-Speed 3359.81 samples/sec Loss 6.8571 LearningRate 0.0575 Epoch: 4 Global Step: 60050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:54:21,709-Speed 3416.99 samples/sec Loss 6.8446 LearningRate 0.0575 Epoch: 4 Global Step: 60060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:24,761-Speed 3357.19 samples/sec Loss 6.9370 LearningRate 0.0575 Epoch: 4 Global Step: 60070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:27,852-Speed 3313.41 samples/sec Loss 6.8591 LearningRate 0.0575 Epoch: 4 Global Step: 60080 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:30,910-Speed 3349.73 samples/sec Loss 6.8674 LearningRate 0.0575 Epoch: 4 Global Step: 60090 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:33,934-Speed 3386.58 samples/sec Loss 6.7762 LearningRate 0.0575 Epoch: 4 Global Step: 60100 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:36,983-Speed 3360.67 samples/sec Loss 6.8779 LearningRate 0.0575 Epoch: 4 Global Step: 60110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:39,987-Speed 3409.75 samples/sec Loss 6.8417 LearningRate 0.0575 Epoch: 4 Global Step: 60120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:43,065-Speed 3327.18 samples/sec Loss 6.9235 LearningRate 0.0574 Epoch: 4 Global Step: 60130 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:46,116-Speed 3356.94 samples/sec Loss 6.9361 LearningRate 0.0574 Epoch: 4 Global Step: 60140 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:49,126-Speed 3404.14 samples/sec Loss 6.8278 LearningRate 0.0574 Epoch: 4 Global Step: 60150 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:54:52,159-Speed 3377.33 samples/sec Loss 6.8612 LearningRate 0.0574 Epoch: 4 Global Step: 60160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:54:55,219-Speed 3347.60 samples/sec Loss 6.7933 LearningRate 0.0574 Epoch: 4 Global Step: 60170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:54:58,235-Speed 3396.20 samples/sec Loss 6.4992 LearningRate 0.0574 Epoch: 4 Global Step: 60180 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:01,250-Speed 3397.93 samples/sec Loss 6.8014 LearningRate 0.0574 Epoch: 4 Global Step: 60190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:04,262-Speed 3400.58 samples/sec Loss 6.8765 LearningRate 0.0574 Epoch: 4 Global Step: 60200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:07,294-Speed 3378.23 samples/sec Loss 6.8522 LearningRate 0.0574 Epoch: 4 Global Step: 60210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:10,315-Speed 3390.92 samples/sec Loss 6.8597 LearningRate 0.0574 Epoch: 4 Global Step: 60220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:13,412-Speed 3307.44 samples/sec Loss 6.9089 LearningRate 0.0574 Epoch: 4 Global Step: 60230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:16,495-Speed 3322.73 samples/sec Loss 6.9087 LearningRate 0.0574 Epoch: 4 Global Step: 60240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:19,596-Speed 3302.93 samples/sec Loss 6.8724 LearningRate 0.0574 Epoch: 4 Global Step: 60250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:22,599-Speed 3410.78 samples/sec Loss 6.9834 LearningRate 0.0574 Epoch: 4 Global Step: 60260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:25,661-Speed 3346.23 samples/sec Loss 6.7929 LearningRate 0.0574 Epoch: 4 Global Step: 60270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:28,751-Speed 3314.89 samples/sec Loss 6.7549 LearningRate 0.0574 Epoch: 4 Global Step: 60280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:55:31,879-Speed 3273.96 samples/sec Loss 6.9173 LearningRate 0.0574 Epoch: 4 Global Step: 60290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:55:34,944-Speed 3342.89 samples/sec Loss 6.8133 LearningRate 0.0573 Epoch: 4 Global Step: 60300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:55:37,980-Speed 3373.51 samples/sec Loss 6.9829 LearningRate 0.0573 Epoch: 4 Global Step: 60310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:55:41,043-Speed 3343.59 samples/sec Loss 6.8337 LearningRate 0.0573 Epoch: 4 Global Step: 60320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:55:44,057-Speed 3399.16 samples/sec Loss 6.6742 LearningRate 0.0573 Epoch: 4 Global Step: 60330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:55:47,102-Speed 3363.18 samples/sec Loss 6.8204 LearningRate 0.0573 Epoch: 4 Global Step: 60340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:55:50,145-Speed 3366.61 samples/sec Loss 6.8198 LearningRate 0.0573 Epoch: 4 Global Step: 60350 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:53,190-Speed 3364.65 samples/sec Loss 6.8867 LearningRate 0.0573 Epoch: 4 Global Step: 60360 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:56,273-Speed 3322.02 samples/sec Loss 6.8249 LearningRate 0.0573 Epoch: 4 Global Step: 60370 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:55:59,341-Speed 3338.79 samples/sec Loss 6.8311 LearningRate 0.0573 Epoch: 4 Global Step: 60380 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:02,471-Speed 3272.84 samples/sec Loss 6.7728 LearningRate 0.0573 Epoch: 4 Global Step: 60390 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:05,524-Speed 3355.32 samples/sec Loss 6.7645 LearningRate 0.0573 Epoch: 4 Global Step: 60400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:08,530-Speed 3407.11 samples/sec Loss 6.6529 LearningRate 0.0573 Epoch: 4 Global Step: 60410 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:11,572-Speed 3367.37 samples/sec Loss 6.6542 LearningRate 0.0573 Epoch: 4 Global Step: 60420 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:14,606-Speed 3376.36 samples/sec Loss 6.8398 LearningRate 0.0573 Epoch: 4 Global Step: 60430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:17,671-Speed 3342.06 samples/sec Loss 6.7240 LearningRate 0.0573 Epoch: 4 Global Step: 60440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:20,741-Speed 3336.37 samples/sec Loss 6.8990 LearningRate 0.0573 Epoch: 4 Global Step: 60450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:23,789-Speed 3360.89 samples/sec Loss 6.9240 LearningRate 0.0572 Epoch: 4 Global Step: 60460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:26,829-Speed 3369.06 samples/sec Loss 6.9851 LearningRate 0.0572 Epoch: 4 Global Step: 60470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:29,911-Speed 3323.62 samples/sec Loss 6.9582 LearningRate 0.0572 Epoch: 4 Global Step: 60480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:32,986-Speed 3330.98 samples/sec Loss 6.9067 LearningRate 0.0572 Epoch: 4 Global Step: 60490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:36,111-Speed 3277.72 samples/sec Loss 6.8215 LearningRate 0.0572 Epoch: 4 Global Step: 60500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:39,210-Speed 3305.58 samples/sec Loss 6.8487 LearningRate 0.0572 Epoch: 4 Global Step: 60510 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:42,343-Speed 3269.52 samples/sec Loss 6.9316 LearningRate 0.0572 Epoch: 4 Global Step: 60520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:45,361-Speed 3393.88 samples/sec Loss 6.8086 LearningRate 0.0572 Epoch: 4 Global Step: 60530 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:48,410-Speed 3360.33 samples/sec Loss 6.8678 LearningRate 0.0572 Epoch: 4 Global Step: 60540 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:51,499-Speed 3316.27 samples/sec Loss 6.9454 LearningRate 0.0572 Epoch: 4 Global Step: 60550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:56:54,559-Speed 3347.75 samples/sec Loss 6.8537 LearningRate 0.0572 Epoch: 4 Global Step: 60560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:56:57,573-Speed 3398.06 samples/sec Loss 6.8302 LearningRate 0.0572 Epoch: 4 Global Step: 60570 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:57:00,618-Speed 3363.74 samples/sec Loss 6.8074 LearningRate 0.0572 Epoch: 4 Global Step: 60580 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:57:03,740-Speed 3281.05 samples/sec Loss 6.9219 LearningRate 0.0572 Epoch: 4 Global Step: 60590 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:57:06,784-Speed 3366.22 samples/sec Loss 6.9690 LearningRate 0.0572 Epoch: 4 Global Step: 60600 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:57:09,812-Speed 3382.43 samples/sec Loss 6.9402 LearningRate 0.0572 Epoch: 4 Global Step: 60610 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:57:12,879-Speed 3339.78 samples/sec Loss 6.7963 LearningRate 0.0572 Epoch: 4 Global Step: 60620 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:57:15,930-Speed 3357.72 samples/sec Loss 6.8431 LearningRate 0.0571 Epoch: 4 Global Step: 60630 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:57:19,021-Speed 3313.51 samples/sec Loss 6.7979 LearningRate 0.0571 Epoch: 4 Global Step: 60640 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:57:22,034-Speed 3399.42 samples/sec Loss 6.8972 LearningRate 0.0571 Epoch: 4 Global Step: 60650 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:57:25,069-Speed 3375.70 samples/sec Loss 6.7820 LearningRate 0.0571 Epoch: 4 Global Step: 60660 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 06:57:28,138-Speed 3338.42 samples/sec Loss 6.7289 LearningRate 0.0571 Epoch: 4 Global Step: 60670 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:57:31,196-Speed 3349.00 samples/sec Loss 6.9107 LearningRate 0.0571 Epoch: 4 Global Step: 60680 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:57:34,249-Speed 3355.71 samples/sec Loss 6.8043 LearningRate 0.0571 Epoch: 4 Global Step: 60690 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:57:37,317-Speed 3338.35 samples/sec Loss 6.9150 LearningRate 0.0571 Epoch: 4 Global Step: 60700 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:57:40,390-Speed 3333.04 samples/sec Loss 6.8748 LearningRate 0.0571 Epoch: 4 Global Step: 60710 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:57:43,449-Speed 3348.30 samples/sec Loss 6.8695 LearningRate 0.0571 Epoch: 4 Global Step: 60720 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:57:46,502-Speed 3355.45 samples/sec Loss 6.9112 LearningRate 0.0571 Epoch: 4 Global Step: 60730 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:57:49,585-Speed 3322.46 samples/sec Loss 6.7513 LearningRate 0.0571 Epoch: 4 Global Step: 60740 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:57:52,654-Speed 3337.79 samples/sec Loss 6.8147 LearningRate 0.0571 Epoch: 4 Global Step: 60750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:57:55,729-Speed 3330.92 samples/sec Loss 6.7607 LearningRate 0.0571 Epoch: 4 Global Step: 60760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:57:58,759-Speed 3381.15 samples/sec Loss 6.8483 LearningRate 0.0571 Epoch: 4 Global Step: 60770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:58:01,856-Speed 3307.58 samples/sec Loss 6.7286 LearningRate 0.0571 Epoch: 4 Global Step: 60780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:58:04,921-Speed 3341.78 samples/sec Loss 6.8410 LearningRate 0.0570 Epoch: 4 Global Step: 60790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:58:07,940-Speed 3392.38 samples/sec Loss 6.8240 LearningRate 0.0570 Epoch: 4 Global Step: 60800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:58:10,966-Speed 3385.84 samples/sec Loss 6.8828 LearningRate 0.0570 Epoch: 4 Global Step: 60810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:58:14,041-Speed 3331.11 samples/sec Loss 6.8401 LearningRate 0.0570 Epoch: 4 Global Step: 60820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:58:17,136-Speed 3309.73 samples/sec Loss 6.7976 LearningRate 0.0570 Epoch: 4 Global Step: 60830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:58:20,159-Speed 3387.89 samples/sec Loss 6.7710 LearningRate 0.0570 Epoch: 4 Global Step: 60840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:58:23,243-Speed 3321.95 samples/sec Loss 6.7916 LearningRate 0.0570 Epoch: 4 Global Step: 60850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:58:26,299-Speed 3352.22 samples/sec Loss 6.8052 LearningRate 0.0570 Epoch: 4 Global Step: 60860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:58:29,407-Speed 3295.82 samples/sec Loss 6.7463 LearningRate 0.0570 Epoch: 4 Global Step: 60870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:58:32,534-Speed 3275.70 samples/sec Loss 6.8954 LearningRate 0.0570 Epoch: 4 Global Step: 60880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:58:35,562-Speed 3383.97 samples/sec Loss 6.8420 LearningRate 0.0570 Epoch: 4 Global Step: 60890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:58:38,617-Speed 3352.92 samples/sec Loss 6.8232 LearningRate 0.0570 Epoch: 4 Global Step: 60900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:58:41,715-Speed 3306.73 samples/sec Loss 6.8141 LearningRate 0.0570 Epoch: 4 Global Step: 60910 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:58:44,782-Speed 3338.97 samples/sec Loss 6.6827 LearningRate 0.0570 Epoch: 4 Global Step: 60920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:58:47,833-Speed 3357.69 samples/sec Loss 6.8562 LearningRate 0.0570 Epoch: 4 Global Step: 60930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:58:50,884-Speed 3357.72 samples/sec Loss 6.8457 LearningRate 0.0570 Epoch: 4 Global Step: 60940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:58:53,912-Speed 3382.96 samples/sec Loss 6.8287 LearningRate 0.0569 Epoch: 4 Global Step: 60950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:58:56,943-Speed 3379.70 samples/sec Loss 6.7331 LearningRate 0.0569 Epoch: 4 Global Step: 60960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:58:59,968-Speed 3386.30 samples/sec Loss 6.8255 LearningRate 0.0569 Epoch: 4 Global Step: 60970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:59:03,018-Speed 3357.25 samples/sec Loss 6.8049 LearningRate 0.0569 Epoch: 4 Global Step: 60980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:59:06,072-Speed 3354.47 samples/sec Loss 6.8959 LearningRate 0.0569 Epoch: 4 Global Step: 60990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:59:09,113-Speed 3368.12 samples/sec Loss 6.8197 LearningRate 0.0569 Epoch: 4 Global Step: 61000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:59:12,150-Speed 3372.82 samples/sec Loss 6.7478 LearningRate 0.0569 Epoch: 4 Global Step: 61010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:59:15,226-Speed 3330.48 samples/sec Loss 6.7780 LearningRate 0.0569 Epoch: 4 Global Step: 61020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 06:59:18,251-Speed 3386.45 samples/sec Loss 6.7319 LearningRate 0.0569 Epoch: 4 Global Step: 61030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:59:21,268-Speed 3395.09 samples/sec Loss 6.8983 LearningRate 0.0569 Epoch: 4 Global Step: 61040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:59:24,308-Speed 3368.99 samples/sec Loss 6.8113 LearningRate 0.0569 Epoch: 4 Global Step: 61050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:59:27,352-Speed 3365.41 samples/sec Loss 6.7691 LearningRate 0.0569 Epoch: 4 Global Step: 61060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:59:30,407-Speed 3353.44 samples/sec Loss 6.8697 LearningRate 0.0569 Epoch: 4 Global Step: 61070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:59:33,437-Speed 3380.62 samples/sec Loss 6.8574 LearningRate 0.0569 Epoch: 4 Global Step: 61080 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:59:36,494-Speed 3351.01 samples/sec Loss 6.8880 LearningRate 0.0569 Epoch: 4 Global Step: 61090 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:59:39,508-Speed 3398.49 samples/sec Loss 6.8258 LearningRate 0.0569 Epoch: 4 Global Step: 61100 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:59:42,589-Speed 3324.14 samples/sec Loss 6.7343 LearningRate 0.0569 Epoch: 4 Global Step: 61110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:59:45,628-Speed 3370.55 samples/sec Loss 6.8224 LearningRate 0.0568 Epoch: 4 Global Step: 61120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:59:48,665-Speed 3373.13 samples/sec Loss 6.8257 LearningRate 0.0568 Epoch: 4 Global Step: 61130 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 06:59:51,703-Speed 3372.29 samples/sec Loss 6.9748 LearningRate 0.0568 Epoch: 4 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:59:54,779-Speed 3330.24 samples/sec Loss 6.9924 LearningRate 0.0568 Epoch: 4 Global Step: 61150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 06:59:57,805-Speed 3384.81 samples/sec Loss 6.8732 LearningRate 0.0568 Epoch: 4 Global Step: 61160 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:00:00,940-Speed 3267.75 samples/sec Loss 6.6608 LearningRate 0.0568 Epoch: 4 Global Step: 61170 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:04,026-Speed 3319.04 samples/sec Loss 6.7044 LearningRate 0.0568 Epoch: 4 Global Step: 61180 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:07,054-Speed 3383.17 samples/sec Loss 6.8848 LearningRate 0.0568 Epoch: 4 Global Step: 61190 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:10,064-Speed 3403.27 samples/sec Loss 6.7709 LearningRate 0.0568 Epoch: 4 Global Step: 61200 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:13,105-Speed 3367.59 samples/sec Loss 6.7733 LearningRate 0.0568 Epoch: 4 Global Step: 61210 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:16,207-Speed 3302.43 samples/sec Loss 6.9067 LearningRate 0.0568 Epoch: 4 Global Step: 61220 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:19,326-Speed 3284.08 samples/sec Loss 6.7800 LearningRate 0.0568 Epoch: 4 Global Step: 61230 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:22,361-Speed 3375.72 samples/sec Loss 6.7494 LearningRate 0.0568 Epoch: 4 Global Step: 61240 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:25,396-Speed 3374.99 samples/sec Loss 6.7335 LearningRate 0.0568 Epoch: 4 Global Step: 61250 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:28,412-Speed 3396.63 samples/sec Loss 6.8528 LearningRate 0.0568 Epoch: 4 Global Step: 61260 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:31,464-Speed 3355.42 samples/sec Loss 6.7088 LearningRate 0.0568 Epoch: 4 Global Step: 61270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:00:34,497-Speed 3378.47 samples/sec Loss 6.8585 LearningRate 0.0567 Epoch: 4 Global Step: 61280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:00:37,592-Speed 3309.17 samples/sec Loss 6.8434 LearningRate 0.0567 Epoch: 4 Global Step: 61290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:00:40,647-Speed 3352.97 samples/sec Loss 6.9098 LearningRate 0.0567 Epoch: 4 Global Step: 61300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:00:43,738-Speed 3314.63 samples/sec Loss 6.9157 LearningRate 0.0567 Epoch: 4 Global Step: 61310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:00:46,748-Speed 3403.00 samples/sec Loss 6.7621 LearningRate 0.0567 Epoch: 4 Global Step: 61320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:00:49,757-Speed 3403.38 samples/sec Loss 6.8690 LearningRate 0.0567 Epoch: 4 Global Step: 61330 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:52,833-Speed 3330.62 samples/sec Loss 6.8955 LearningRate 0.0567 Epoch: 4 Global Step: 61340 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:55,910-Speed 3328.62 samples/sec Loss 6.7355 LearningRate 0.0567 Epoch: 4 Global Step: 61350 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:00:58,955-Speed 3364.52 samples/sec Loss 6.8588 LearningRate 0.0567 Epoch: 4 Global Step: 61360 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:01:02,072-Speed 3286.23 samples/sec Loss 6.8191 LearningRate 0.0567 Epoch: 4 Global Step: 61370 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:01:05,174-Speed 3301.96 samples/sec Loss 6.8064 LearningRate 0.0567 Epoch: 4 Global Step: 61380 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:01:08,213-Speed 3370.76 samples/sec Loss 6.7187 LearningRate 0.0567 Epoch: 4 Global Step: 61390 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:01:11,259-Speed 3362.47 samples/sec Loss 6.7671 LearningRate 0.0567 Epoch: 4 Global Step: 61400 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:01:14,310-Speed 3357.53 samples/sec Loss 6.8131 LearningRate 0.0567 Epoch: 4 Global Step: 61410 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:01:17,388-Speed 3327.69 samples/sec Loss 6.8237 LearningRate 0.0567 Epoch: 4 Global Step: 61420 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:01:20,409-Speed 3391.54 samples/sec Loss 6.8251 LearningRate 0.0567 Epoch: 4 Global Step: 61430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:01:23,509-Speed 3303.63 samples/sec Loss 6.6873 LearningRate 0.0567 Epoch: 4 Global Step: 61440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:01:27,334-Speed 2678.12 samples/sec Loss 6.6634 LearningRate 0.0566 Epoch: 4 Global Step: 61450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:01:30,348-Speed 3398.07 samples/sec Loss 6.7510 LearningRate 0.0566 Epoch: 4 Global Step: 61460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:01:33,396-Speed 3360.89 samples/sec Loss 6.7336 LearningRate 0.0566 Epoch: 4 Global Step: 61470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:01:36,520-Speed 3278.94 samples/sec Loss 6.8510 LearningRate 0.0566 Epoch: 4 Global Step: 61480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:01:39,577-Speed 3350.68 samples/sec Loss 6.7528 LearningRate 0.0566 Epoch: 4 Global Step: 61490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:01:42,632-Speed 3352.94 samples/sec Loss 6.7425 LearningRate 0.0566 Epoch: 4 Global Step: 61500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:01:45,674-Speed 3367.70 samples/sec Loss 6.9249 LearningRate 0.0566 Epoch: 4 Global Step: 61510 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:01:48,690-Speed 3396.06 samples/sec Loss 6.8799 LearningRate 0.0566 Epoch: 4 Global Step: 61520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:01:51,841-Speed 3250.42 samples/sec Loss 6.7741 LearningRate 0.0566 Epoch: 4 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:01:54,895-Speed 3354.20 samples/sec Loss 6.6912 LearningRate 0.0566 Epoch: 4 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:01:57,913-Speed 3393.57 samples/sec Loss 6.8261 LearningRate 0.0566 Epoch: 4 Global Step: 61550 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:00,930-Speed 3396.14 samples/sec Loss 6.7433 LearningRate 0.0566 Epoch: 4 Global Step: 61560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:03,959-Speed 3380.99 samples/sec Loss 6.7895 LearningRate 0.0566 Epoch: 4 Global Step: 61570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:07,031-Speed 3335.56 samples/sec Loss 6.8208 LearningRate 0.0566 Epoch: 4 Global Step: 61580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:10,056-Speed 3385.72 samples/sec Loss 6.8262 LearningRate 0.0566 Epoch: 4 Global Step: 61590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:13,119-Speed 3343.44 samples/sec Loss 6.9204 LearningRate 0.0566 Epoch: 4 Global Step: 61600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:16,137-Speed 3394.62 samples/sec Loss 6.8686 LearningRate 0.0565 Epoch: 4 Global Step: 61610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:19,219-Speed 3324.02 samples/sec Loss 6.8011 LearningRate 0.0565 Epoch: 4 Global Step: 61620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:22,224-Speed 3408.77 samples/sec Loss 6.6293 LearningRate 0.0565 Epoch: 4 Global Step: 61630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:25,247-Speed 3388.24 samples/sec Loss 6.7425 LearningRate 0.0565 Epoch: 4 Global Step: 61640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:28,332-Speed 3320.17 samples/sec Loss 6.8467 LearningRate 0.0565 Epoch: 4 Global Step: 61650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:02:31,362-Speed 3380.32 samples/sec Loss 6.7981 LearningRate 0.0565 Epoch: 4 Global Step: 61660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:02:34,375-Speed 3400.15 samples/sec Loss 6.6967 LearningRate 0.0565 Epoch: 4 Global Step: 61670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:02:37,457-Speed 3324.10 samples/sec Loss 6.7405 LearningRate 0.0565 Epoch: 4 Global Step: 61680 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:40,489-Speed 3377.53 samples/sec Loss 6.7483 LearningRate 0.0565 Epoch: 4 Global Step: 61690 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:43,492-Speed 3411.39 samples/sec Loss 6.6710 LearningRate 0.0565 Epoch: 4 Global Step: 61700 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:46,502-Speed 3403.50 samples/sec Loss 6.7780 LearningRate 0.0565 Epoch: 4 Global Step: 61710 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:49,610-Speed 3295.33 samples/sec Loss 6.9827 LearningRate 0.0565 Epoch: 4 Global Step: 61720 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:52,754-Speed 3257.58 samples/sec Loss 6.7739 LearningRate 0.0565 Epoch: 4 Global Step: 61730 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:55,767-Speed 3400.07 samples/sec Loss 6.7764 LearningRate 0.0565 Epoch: 4 Global Step: 61740 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:02:58,783-Speed 3396.09 samples/sec Loss 6.7875 LearningRate 0.0565 Epoch: 4 Global Step: 61750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:03:01,872-Speed 3316.65 samples/sec Loss 6.7793 LearningRate 0.0565 Epoch: 4 Global Step: 61760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:03:04,935-Speed 3343.78 samples/sec Loss 6.8029 LearningRate 0.0565 Epoch: 4 Global Step: 61770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:03:08,013-Speed 3328.21 samples/sec Loss 6.7428 LearningRate 0.0564 Epoch: 4 Global Step: 61780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:03:11,031-Speed 3394.11 samples/sec Loss 6.7861 LearningRate 0.0564 Epoch: 4 Global Step: 61790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:03:14,045-Speed 3397.74 samples/sec Loss 6.8255 LearningRate 0.0564 Epoch: 4 Global Step: 61800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:03:17,096-Speed 3357.85 samples/sec Loss 6.6893 LearningRate 0.0564 Epoch: 4 Global Step: 61810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:03:20,089-Speed 3422.87 samples/sec Loss 6.7240 LearningRate 0.0564 Epoch: 4 Global Step: 61820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:03:23,138-Speed 3358.85 samples/sec Loss 6.8166 LearningRate 0.0564 Epoch: 4 Global Step: 61830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:03:26,185-Speed 3361.91 samples/sec Loss 6.7764 LearningRate 0.0564 Epoch: 4 Global Step: 61840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:03:29,246-Speed 3346.14 samples/sec Loss 6.8430 LearningRate 0.0564 Epoch: 4 Global Step: 61850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:03:32,259-Speed 3399.57 samples/sec Loss 6.7936 LearningRate 0.0564 Epoch: 4 Global Step: 61860 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:03:35,297-Speed 3372.34 samples/sec Loss 6.7564 LearningRate 0.0564 Epoch: 4 Global Step: 61870 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:03:38,370-Speed 3333.49 samples/sec Loss 6.9014 LearningRate 0.0564 Epoch: 4 Global Step: 61880 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:03:41,441-Speed 3334.91 samples/sec Loss 6.7527 LearningRate 0.0564 Epoch: 4 Global Step: 61890 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:03:44,485-Speed 3365.27 samples/sec Loss 6.7518 LearningRate 0.0564 Epoch: 4 Global Step: 61900 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:03:47,571-Speed 3319.01 samples/sec Loss 6.7304 LearningRate 0.0564 Epoch: 4 Global Step: 61910 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:03:50,642-Speed 3335.46 samples/sec Loss 6.9454 LearningRate 0.0564 Epoch: 4 Global Step: 61920 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:03:53,681-Speed 3371.14 samples/sec Loss 6.8760 LearningRate 0.0564 Epoch: 4 Global Step: 61930 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:03:56,705-Speed 3386.56 samples/sec Loss 6.8197 LearningRate 0.0563 Epoch: 4 Global Step: 61940 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:03:59,708-Speed 3411.40 samples/sec Loss 6.7837 LearningRate 0.0563 Epoch: 4 Global Step: 61950 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:04:02,748-Speed 3369.06 samples/sec Loss 6.8131 LearningRate 0.0563 Epoch: 4 Global Step: 61960 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:04:05,812-Speed 3343.78 samples/sec Loss 6.7037 LearningRate 0.0563 Epoch: 4 Global Step: 61970 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:04:08,860-Speed 3360.52 samples/sec Loss 6.8405 LearningRate 0.0563 Epoch: 4 Global Step: 61980 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:04:11,934-Speed 3332.02 samples/sec Loss 6.7973 LearningRate 0.0563 Epoch: 4 Global Step: 61990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:04:15,084-Speed 3251.75 samples/sec Loss 6.7759 LearningRate 0.0563 Epoch: 4 Global Step: 62000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:04:18,122-Speed 3371.32 samples/sec Loss 6.7493 LearningRate 0.0563 Epoch: 4 Global Step: 62010 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:04:21,177-Speed 3353.54 samples/sec Loss 6.7872 LearningRate 0.0563 Epoch: 4 Global Step: 62020 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:04:24,266-Speed 3316.11 samples/sec Loss 6.8434 LearningRate 0.0563 Epoch: 4 Global Step: 62030 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:04:27,376-Speed 3293.48 samples/sec Loss 6.7706 LearningRate 0.0563 Epoch: 4 Global Step: 62040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:04:30,439-Speed 3344.42 samples/sec Loss 6.7871 LearningRate 0.0563 Epoch: 4 Global Step: 62050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:04:33,463-Speed 3386.81 samples/sec Loss 6.7668 LearningRate 0.0563 Epoch: 4 Global Step: 62060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:04:36,537-Speed 3333.03 samples/sec Loss 6.8030 LearningRate 0.0563 Epoch: 4 Global Step: 62070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:04:39,554-Speed 3393.91 samples/sec Loss 6.8044 LearningRate 0.0563 Epoch: 4 Global Step: 62080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:04:42,596-Speed 3368.13 samples/sec Loss 6.7423 LearningRate 0.0563 Epoch: 4 Global Step: 62090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:04:45,876-Speed 3123.05 samples/sec Loss 6.7985 LearningRate 0.0563 Epoch: 4 Global Step: 62100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:17,101-Speed 327.95 samples/sec Loss 6.0436 LearningRate 0.0562 Epoch: 5 Global Step: 62110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:20,239-Speed 3264.78 samples/sec Loss 5.2854 LearningRate 0.0562 Epoch: 5 Global Step: 62120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:23,309-Speed 3335.98 samples/sec Loss 5.2553 LearningRate 0.0562 Epoch: 5 Global Step: 62130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:26,371-Speed 3346.31 samples/sec Loss 5.2035 LearningRate 0.0562 Epoch: 5 Global Step: 62140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:29,494-Speed 3279.56 samples/sec Loss 5.1410 LearningRate 0.0562 Epoch: 5 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:32,521-Speed 3383.76 samples/sec Loss 5.1855 LearningRate 0.0562 Epoch: 5 Global Step: 62160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 07:05:35,564-Speed 3366.47 samples/sec Loss 5.1190 LearningRate 0.0562 Epoch: 5 Global Step: 62170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:38,596-Speed 3378.89 samples/sec Loss 5.2076 LearningRate 0.0562 Epoch: 5 Global Step: 62180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:41,654-Speed 3349.19 samples/sec Loss 5.2136 LearningRate 0.0562 Epoch: 5 Global Step: 62190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:44,675-Speed 3390.94 samples/sec Loss 5.1406 LearningRate 0.0562 Epoch: 5 Global Step: 62200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:47,711-Speed 3373.91 samples/sec Loss 5.1811 LearningRate 0.0562 Epoch: 5 Global Step: 62210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:50,732-Speed 3390.29 samples/sec Loss 5.1106 LearningRate 0.0562 Epoch: 5 Global Step: 62220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:53,820-Speed 3317.98 samples/sec Loss 5.1114 LearningRate 0.0562 Epoch: 5 Global Step: 62230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:56,984-Speed 3236.40 samples/sec Loss 5.2361 LearningRate 0.0562 Epoch: 5 Global Step: 62240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:05:59,990-Speed 3408.51 samples/sec Loss 5.2502 LearningRate 0.0562 Epoch: 5 Global Step: 62250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:06:03,050-Speed 3347.62 samples/sec Loss 5.2655 LearningRate 0.0562 Epoch: 5 Global Step: 62260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:06:06,052-Speed 3412.04 samples/sec Loss 5.2122 LearningRate 0.0562 Epoch: 5 Global Step: 62270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:06:09,056-Speed 3409.30 samples/sec Loss 5.1602 LearningRate 0.0561 Epoch: 5 Global Step: 62280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:06:12,075-Speed 3393.49 samples/sec Loss 5.2266 LearningRate 0.0561 Epoch: 5 Global Step: 62290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:06:15,090-Speed 3397.95 samples/sec Loss 5.2117 LearningRate 0.0561 Epoch: 5 Global Step: 62300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:06:18,139-Speed 3358.70 samples/sec Loss 5.2068 LearningRate 0.0561 Epoch: 5 Global Step: 62310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:06:21,176-Speed 3373.11 samples/sec Loss 5.2045 LearningRate 0.0561 Epoch: 5 Global Step: 62320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:06:24,267-Speed 3314.64 samples/sec Loss 5.2228 LearningRate 0.0561 Epoch: 5 Global Step: 62330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:06:27,325-Speed 3348.48 samples/sec Loss 5.3077 LearningRate 0.0561 Epoch: 5 Global Step: 62340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:06:30,430-Speed 3299.68 samples/sec Loss 5.2368 LearningRate 0.0561 Epoch: 5 Global Step: 62350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:06:33,438-Speed 3405.33 samples/sec Loss 5.3677 LearningRate 0.0561 Epoch: 5 Global Step: 62360 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:06:36,495-Speed 3350.27 samples/sec Loss 5.2370 LearningRate 0.0561 Epoch: 5 Global Step: 62370 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:06:39,611-Speed 3287.01 samples/sec Loss 5.3264 LearningRate 0.0561 Epoch: 5 Global Step: 62380 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:06:42,661-Speed 3359.12 samples/sec Loss 5.2407 LearningRate 0.0561 Epoch: 5 Global Step: 62390 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:06:45,665-Speed 3409.87 samples/sec Loss 5.3958 LearningRate 0.0561 Epoch: 5 Global Step: 62400 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:06:48,715-Speed 3358.23 samples/sec Loss 5.3562 LearningRate 0.0561 Epoch: 5 Global Step: 62410 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:06:51,787-Speed 3334.71 samples/sec Loss 5.3924 LearningRate 0.0561 Epoch: 5 Global Step: 62420 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:06:54,794-Speed 3406.70 samples/sec Loss 5.3661 LearningRate 0.0561 Epoch: 5 Global Step: 62430 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:06:57,830-Speed 3373.60 samples/sec Loss 5.3451 LearningRate 0.0560 Epoch: 5 Global Step: 62440 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:07:00,853-Speed 3388.87 samples/sec Loss 5.3339 LearningRate 0.0560 Epoch: 5 Global Step: 62450 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:07:03,946-Speed 3311.83 samples/sec Loss 5.2643 LearningRate 0.0560 Epoch: 5 Global Step: 62460 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:07:07,042-Speed 3307.81 samples/sec Loss 5.3466 LearningRate 0.0560 Epoch: 5 Global Step: 62470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:10,046-Speed 3409.81 samples/sec Loss 5.2855 LearningRate 0.0560 Epoch: 5 Global Step: 62480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:13,068-Speed 3389.17 samples/sec Loss 5.4047 LearningRate 0.0560 Epoch: 5 Global Step: 62490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:16,086-Speed 3395.25 samples/sec Loss 5.2634 LearningRate 0.0560 Epoch: 5 Global Step: 62500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:19,115-Speed 3381.08 samples/sec Loss 5.2822 LearningRate 0.0560 Epoch: 5 Global Step: 62510 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:22,120-Speed 3409.14 samples/sec Loss 5.3074 LearningRate 0.0560 Epoch: 5 Global Step: 62520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:25,120-Speed 3414.52 samples/sec Loss 5.3642 LearningRate 0.0560 Epoch: 5 Global Step: 62530 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:28,159-Speed 3370.31 samples/sec Loss 5.3322 LearningRate 0.0560 Epoch: 5 Global Step: 62540 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:31,148-Speed 3427.51 samples/sec Loss 5.4012 LearningRate 0.0560 Epoch: 5 Global Step: 62550 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:34,173-Speed 3386.41 samples/sec Loss 5.3952 LearningRate 0.0560 Epoch: 5 Global Step: 62560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:37,190-Speed 3394.36 samples/sec Loss 5.3804 LearningRate 0.0560 Epoch: 5 Global Step: 62570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:07:40,240-Speed 3359.03 samples/sec Loss 5.2570 LearningRate 0.0560 Epoch: 5 Global Step: 62580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:43,337-Speed 3307.73 samples/sec Loss 5.3430 LearningRate 0.0560 Epoch: 5 Global Step: 62590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:46,365-Speed 3383.13 samples/sec Loss 5.3266 LearningRate 0.0560 Epoch: 5 Global Step: 62600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:49,379-Speed 3398.06 samples/sec Loss 5.3727 LearningRate 0.0559 Epoch: 5 Global Step: 62610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:52,418-Speed 3371.08 samples/sec Loss 5.3958 LearningRate 0.0559 Epoch: 5 Global Step: 62620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:55,435-Speed 3395.55 samples/sec Loss 5.4697 LearningRate 0.0559 Epoch: 5 Global Step: 62630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:07:58,451-Speed 3396.55 samples/sec Loss 5.4711 LearningRate 0.0559 Epoch: 5 Global Step: 62640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:08:01,505-Speed 3353.60 samples/sec Loss 5.3254 LearningRate 0.0559 Epoch: 5 Global Step: 62650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:08:04,583-Speed 3328.14 samples/sec Loss 5.4678 LearningRate 0.0559 Epoch: 5 Global Step: 62660 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:08:07,608-Speed 3386.50 samples/sec Loss 5.3988 LearningRate 0.0559 Epoch: 5 Global Step: 62670 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:08:10,604-Speed 3418.37 samples/sec Loss 5.4226 LearningRate 0.0559 Epoch: 5 Global Step: 62680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:08:13,646-Speed 3366.63 samples/sec Loss 5.4583 LearningRate 0.0559 Epoch: 5 Global Step: 62690 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:08:16,652-Speed 3407.51 samples/sec Loss 5.3968 LearningRate 0.0559 Epoch: 5 Global Step: 62700 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:08:19,680-Speed 3383.83 samples/sec Loss 5.5079 LearningRate 0.0559 Epoch: 5 Global Step: 62710 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:08:22,718-Speed 3371.98 samples/sec Loss 5.4619 LearningRate 0.0559 Epoch: 5 Global Step: 62720 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:08:25,724-Speed 3406.65 samples/sec Loss 5.3682 LearningRate 0.0559 Epoch: 5 Global Step: 62730 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:08:28,722-Speed 3416.67 samples/sec Loss 5.3895 LearningRate 0.0559 Epoch: 5 Global Step: 62740 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:08:31,749-Speed 3384.97 samples/sec Loss 5.4998 LearningRate 0.0559 Epoch: 5 Global Step: 62750 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:08:34,791-Speed 3366.27 samples/sec Loss 5.3729 LearningRate 0.0559 Epoch: 5 Global Step: 62760 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:08:37,826-Speed 3376.15 samples/sec Loss 5.4575 LearningRate 0.0558 Epoch: 5 Global Step: 62770 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:08:40,857-Speed 3378.82 samples/sec Loss 5.4577 LearningRate 0.0558 Epoch: 5 Global Step: 62780 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:08:43,938-Speed 3325.07 samples/sec Loss 5.3501 LearningRate 0.0558 Epoch: 5 Global Step: 62790 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:08:46,976-Speed 3371.81 samples/sec Loss 5.3795 LearningRate 0.0558 Epoch: 5 Global Step: 62800 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:08:49,976-Speed 3413.82 samples/sec Loss 5.4623 LearningRate 0.0558 Epoch: 5 Global Step: 62810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:08:53,071-Speed 3309.94 samples/sec Loss 5.4076 LearningRate 0.0558 Epoch: 5 Global Step: 62820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:08:56,089-Speed 3394.21 samples/sec Loss 5.4315 LearningRate 0.0558 Epoch: 5 Global Step: 62830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:08:59,141-Speed 3356.62 samples/sec Loss 5.4244 LearningRate 0.0558 Epoch: 5 Global Step: 62840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:09:02,150-Speed 3404.50 samples/sec Loss 5.5081 LearningRate 0.0558 Epoch: 5 Global Step: 62850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:09:05,152-Speed 3411.79 samples/sec Loss 5.5006 LearningRate 0.0558 Epoch: 5 Global Step: 62860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:09:08,159-Speed 3406.80 samples/sec Loss 5.4321 LearningRate 0.0558 Epoch: 5 Global Step: 62870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:09:11,157-Speed 3416.87 samples/sec Loss 5.5991 LearningRate 0.0558 Epoch: 5 Global Step: 62880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:09:14,166-Speed 3404.31 samples/sec Loss 5.4184 LearningRate 0.0558 Epoch: 5 Global Step: 62890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:09:17,236-Speed 3336.21 samples/sec Loss 5.4584 LearningRate 0.0558 Epoch: 5 Global Step: 62900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:09:20,262-Speed 3384.26 samples/sec Loss 5.4496 LearningRate 0.0558 Epoch: 5 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:23,296-Speed 3377.17 samples/sec Loss 5.4662 LearningRate 0.0558 Epoch: 5 Global Step: 62920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:26,311-Speed 3397.11 samples/sec Loss 5.4695 LearningRate 0.0558 Epoch: 5 Global Step: 62930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:29,393-Speed 3323.79 samples/sec Loss 5.5759 LearningRate 0.0557 Epoch: 5 Global Step: 62940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:32,406-Speed 3399.46 samples/sec Loss 5.6518 LearningRate 0.0557 Epoch: 5 Global Step: 62950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:35,457-Speed 3357.30 samples/sec Loss 5.6774 LearningRate 0.0557 Epoch: 5 Global Step: 62960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:38,468-Speed 3402.24 samples/sec Loss 5.5634 LearningRate 0.0557 Epoch: 5 Global Step: 62970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:41,511-Speed 3366.08 samples/sec Loss 5.4944 LearningRate 0.0557 Epoch: 5 Global Step: 62980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:44,562-Speed 3358.03 samples/sec Loss 5.5133 LearningRate 0.0557 Epoch: 5 Global Step: 62990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:47,571-Speed 3404.09 samples/sec Loss 5.5607 LearningRate 0.0557 Epoch: 5 Global Step: 63000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:50,604-Speed 3376.87 samples/sec Loss 5.5825 LearningRate 0.0557 Epoch: 5 Global Step: 63010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:53,626-Speed 3389.59 samples/sec Loss 5.5720 LearningRate 0.0557 Epoch: 5 Global Step: 63020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:56,654-Speed 3383.55 samples/sec Loss 5.4836 LearningRate 0.0557 Epoch: 5 Global Step: 63030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:09:59,663-Speed 3403.91 samples/sec Loss 5.6519 LearningRate 0.0557 Epoch: 5 Global Step: 63040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:10:02,685-Speed 3389.87 samples/sec Loss 5.6127 LearningRate 0.0557 Epoch: 5 Global Step: 63050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:10:05,731-Speed 3363.07 samples/sec Loss 5.6822 LearningRate 0.0557 Epoch: 5 Global Step: 63060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:10:08,750-Speed 3392.67 samples/sec Loss 5.4599 LearningRate 0.0557 Epoch: 5 Global Step: 63070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:10:11,791-Speed 3368.05 samples/sec Loss 5.4643 LearningRate 0.0557 Epoch: 5 Global Step: 63080 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:10:14,811-Speed 3392.54 samples/sec Loss 5.5424 LearningRate 0.0557 Epoch: 5 Global Step: 63090 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:10:17,849-Speed 3370.94 samples/sec Loss 5.5130 LearningRate 0.0557 Epoch: 5 Global Step: 63100 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:10:20,863-Speed 3398.83 samples/sec Loss 5.6007 LearningRate 0.0556 Epoch: 5 Global Step: 63110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:10:23,861-Speed 3416.69 samples/sec Loss 5.7373 LearningRate 0.0556 Epoch: 5 Global Step: 63120 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:10:26,869-Speed 3406.89 samples/sec Loss 5.6441 LearningRate 0.0556 Epoch: 5 Global Step: 63130 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:10:29,985-Speed 3286.77 samples/sec Loss 5.5627 LearningRate 0.0556 Epoch: 5 Global Step: 63140 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:10:33,040-Speed 3352.70 samples/sec Loss 5.5570 LearningRate 0.0556 Epoch: 5 Global Step: 63150 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:10:36,073-Speed 3377.31 samples/sec Loss 5.4936 LearningRate 0.0556 Epoch: 5 Global Step: 63160 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:10:39,086-Speed 3400.09 samples/sec Loss 5.4351 LearningRate 0.0556 Epoch: 5 Global Step: 63170 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:10:42,113-Speed 3384.35 samples/sec Loss 5.5656 LearningRate 0.0556 Epoch: 5 Global Step: 63180 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:10:45,139-Speed 3385.11 samples/sec Loss 5.4982 LearningRate 0.0556 Epoch: 5 Global Step: 63190 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:10:48,192-Speed 3354.75 samples/sec Loss 5.5693 LearningRate 0.0556 Epoch: 5 Global Step: 63200 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:10:51,206-Speed 3398.78 samples/sec Loss 5.5771 LearningRate 0.0556 Epoch: 5 Global Step: 63210 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:10:54,216-Speed 3403.39 samples/sec Loss 5.6811 LearningRate 0.0556 Epoch: 5 Global Step: 63220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:10:57,232-Speed 3396.36 samples/sec Loss 5.6878 LearningRate 0.0556 Epoch: 5 Global Step: 63230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:00,279-Speed 3362.41 samples/sec Loss 5.6470 LearningRate 0.0556 Epoch: 5 Global Step: 63240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:03,288-Speed 3404.03 samples/sec Loss 5.6513 LearningRate 0.0556 Epoch: 5 Global Step: 63250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:06,346-Speed 3349.52 samples/sec Loss 5.5967 LearningRate 0.0556 Epoch: 5 Global Step: 63260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:09,359-Speed 3399.32 samples/sec Loss 5.7393 LearningRate 0.0555 Epoch: 5 Global Step: 63270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:12,371-Speed 3400.85 samples/sec Loss 5.6776 LearningRate 0.0555 Epoch: 5 Global Step: 63280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:15,437-Speed 3341.50 samples/sec Loss 5.6121 LearningRate 0.0555 Epoch: 5 Global Step: 63290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:18,449-Speed 3401.01 samples/sec Loss 5.5679 LearningRate 0.0555 Epoch: 5 Global Step: 63300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:21,481-Speed 3377.89 samples/sec Loss 5.5907 LearningRate 0.0555 Epoch: 5 Global Step: 63310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:24,536-Speed 3353.02 samples/sec Loss 5.6464 LearningRate 0.0555 Epoch: 5 Global Step: 63320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:11:27,554-Speed 3394.49 samples/sec Loss 5.6301 LearningRate 0.0555 Epoch: 5 Global Step: 63330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:11:30,567-Speed 3399.75 samples/sec Loss 5.5809 LearningRate 0.0555 Epoch: 5 Global Step: 63340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:11:33,658-Speed 3313.67 samples/sec Loss 5.6119 LearningRate 0.0555 Epoch: 5 Global Step: 63350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:11:36,716-Speed 3350.10 samples/sec Loss 5.6998 LearningRate 0.0555 Epoch: 5 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:11:39,745-Speed 3381.24 samples/sec Loss 5.6318 LearningRate 0.0555 Epoch: 5 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:11:42,782-Speed 3372.85 samples/sec Loss 5.7201 LearningRate 0.0555 Epoch: 5 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:11:45,803-Speed 3390.67 samples/sec Loss 5.8212 LearningRate 0.0555 Epoch: 5 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:11:48,868-Speed 3341.62 samples/sec Loss 5.7203 LearningRate 0.0555 Epoch: 5 Global Step: 63400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:51,934-Speed 3341.40 samples/sec Loss 5.7185 LearningRate 0.0555 Epoch: 5 Global Step: 63410 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:54,981-Speed 3361.71 samples/sec Loss 5.7346 LearningRate 0.0555 Epoch: 5 Global Step: 63420 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:11:58,022-Speed 3369.20 samples/sec Loss 5.7029 LearningRate 0.0555 Epoch: 5 Global Step: 63430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:12:01,027-Speed 3409.00 samples/sec Loss 5.6789 LearningRate 0.0554 Epoch: 5 Global Step: 63440 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:12:04,045-Speed 3393.66 samples/sec Loss 5.7523 LearningRate 0.0554 Epoch: 5 Global Step: 63450 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:12:07,051-Speed 3407.61 samples/sec Loss 5.7415 LearningRate 0.0554 Epoch: 5 Global Step: 63460 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:12:10,051-Speed 3414.07 samples/sec Loss 5.7790 LearningRate 0.0554 Epoch: 5 Global Step: 63470 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:12:13,078-Speed 3384.15 samples/sec Loss 5.6338 LearningRate 0.0554 Epoch: 5 Global Step: 63480 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:12:16,109-Speed 3380.22 samples/sec Loss 5.6856 LearningRate 0.0554 Epoch: 5 Global Step: 63490 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:12:19,110-Speed 3412.38 samples/sec Loss 5.6191 LearningRate 0.0554 Epoch: 5 Global Step: 63500 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:12:22,116-Speed 3407.98 samples/sec Loss 5.7248 LearningRate 0.0554 Epoch: 5 Global Step: 63510 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:12:25,119-Speed 3410.70 samples/sec Loss 5.6381 LearningRate 0.0554 Epoch: 5 Global Step: 63520 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:12:28,132-Speed 3399.97 samples/sec Loss 5.6332 LearningRate 0.0554 Epoch: 5 Global Step: 63530 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:12:31,137-Speed 3408.41 samples/sec Loss 5.6204 LearningRate 0.0554 Epoch: 5 Global Step: 63540 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:12:34,150-Speed 3399.75 samples/sec Loss 5.7883 LearningRate 0.0554 Epoch: 5 Global Step: 63550 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:12:37,202-Speed 3355.99 samples/sec Loss 5.6418 LearningRate 0.0554 Epoch: 5 Global Step: 63560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:12:40,250-Speed 3361.05 samples/sec Loss 5.7529 LearningRate 0.0554 Epoch: 5 Global Step: 63570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:12:43,286-Speed 3374.18 samples/sec Loss 5.7562 LearningRate 0.0554 Epoch: 5 Global Step: 63580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:12:46,328-Speed 3367.20 samples/sec Loss 5.6646 LearningRate 0.0554 Epoch: 5 Global Step: 63590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:12:49,361-Speed 3377.97 samples/sec Loss 5.7764 LearningRate 0.0554 Epoch: 5 Global Step: 63600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:12:52,395-Speed 3375.95 samples/sec Loss 5.7169 LearningRate 0.0553 Epoch: 5 Global Step: 63610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:12:55,520-Speed 3277.35 samples/sec Loss 5.8248 LearningRate 0.0553 Epoch: 5 Global Step: 63620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:12:58,540-Speed 3392.67 samples/sec Loss 5.6333 LearningRate 0.0553 Epoch: 5 Global Step: 63630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:13:01,542-Speed 3412.39 samples/sec Loss 5.7428 LearningRate 0.0553 Epoch: 5 Global Step: 63640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:13:04,607-Speed 3341.21 samples/sec Loss 5.8310 LearningRate 0.0553 Epoch: 5 Global Step: 63650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:13:07,631-Speed 3387.35 samples/sec Loss 5.8262 LearningRate 0.0553 Epoch: 5 Global Step: 63660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:13:10,637-Speed 3408.04 samples/sec Loss 5.6899 LearningRate 0.0553 Epoch: 5 Global Step: 63670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:13:13,676-Speed 3371.16 samples/sec Loss 5.7478 LearningRate 0.0553 Epoch: 5 Global Step: 63680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:13:16,702-Speed 3384.93 samples/sec Loss 5.7255 LearningRate 0.0553 Epoch: 5 Global Step: 63690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:13:19,698-Speed 3417.96 samples/sec Loss 5.7190 LearningRate 0.0553 Epoch: 5 Global Step: 63700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:13:22,723-Speed 3387.41 samples/sec Loss 5.7661 LearningRate 0.0553 Epoch: 5 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:13:25,801-Speed 3327.71 samples/sec Loss 5.8148 LearningRate 0.0553 Epoch: 5 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:13:28,820-Speed 3392.64 samples/sec Loss 5.8041 LearningRate 0.0553 Epoch: 5 Global Step: 63730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:13:31,818-Speed 3416.87 samples/sec Loss 5.8277 LearningRate 0.0553 Epoch: 5 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:13:34,818-Speed 3413.99 samples/sec Loss 5.8527 LearningRate 0.0553 Epoch: 5 Global Step: 63750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:13:37,852-Speed 3376.37 samples/sec Loss 5.7068 LearningRate 0.0553 Epoch: 5 Global Step: 63760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:13:40,866-Speed 3399.00 samples/sec Loss 5.8326 LearningRate 0.0552 Epoch: 5 Global Step: 63770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:13:43,873-Speed 3406.27 samples/sec Loss 5.7316 LearningRate 0.0552 Epoch: 5 Global Step: 63780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:13:46,899-Speed 3384.82 samples/sec Loss 5.7272 LearningRate 0.0552 Epoch: 5 Global Step: 63790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:13:49,927-Speed 3382.61 samples/sec Loss 5.7660 LearningRate 0.0552 Epoch: 5 Global Step: 63800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:13:52,929-Speed 3412.44 samples/sec Loss 5.7734 LearningRate 0.0552 Epoch: 5 Global Step: 63810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:13:55,951-Speed 3389.29 samples/sec Loss 5.8078 LearningRate 0.0552 Epoch: 5 Global Step: 63820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:13:58,948-Speed 3418.63 samples/sec Loss 5.8218 LearningRate 0.0552 Epoch: 5 Global Step: 63830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:01,964-Speed 3395.67 samples/sec Loss 5.7826 LearningRate 0.0552 Epoch: 5 Global Step: 63840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:05,020-Speed 3352.67 samples/sec Loss 5.7078 LearningRate 0.0552 Epoch: 5 Global Step: 63850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:14:08,082-Speed 3344.86 samples/sec Loss 5.8743 LearningRate 0.0552 Epoch: 5 Global Step: 63860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:14:11,105-Speed 3389.05 samples/sec Loss 5.9314 LearningRate 0.0552 Epoch: 5 Global Step: 63870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:14,153-Speed 3360.22 samples/sec Loss 5.8329 LearningRate 0.0552 Epoch: 5 Global Step: 63880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:17,155-Speed 3412.63 samples/sec Loss 5.7831 LearningRate 0.0552 Epoch: 5 Global Step: 63890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:20,170-Speed 3397.99 samples/sec Loss 5.8094 LearningRate 0.0552 Epoch: 5 Global Step: 63900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:23,169-Speed 3415.14 samples/sec Loss 5.8150 LearningRate 0.0552 Epoch: 5 Global Step: 63910 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:26,274-Speed 3299.59 samples/sec Loss 5.8895 LearningRate 0.0552 Epoch: 5 Global Step: 63920 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:29,324-Speed 3358.53 samples/sec Loss 5.7802 LearningRate 0.0552 Epoch: 5 Global Step: 63930 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:32,354-Speed 3380.16 samples/sec Loss 5.7889 LearningRate 0.0551 Epoch: 5 Global Step: 63940 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:35,378-Speed 3387.69 samples/sec Loss 5.7544 LearningRate 0.0551 Epoch: 5 Global Step: 63950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:38,458-Speed 3326.05 samples/sec Loss 5.7497 LearningRate 0.0551 Epoch: 5 Global Step: 63960 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:41,491-Speed 3377.41 samples/sec Loss 5.7791 LearningRate 0.0551 Epoch: 5 Global Step: 63970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:14:44,498-Speed 3406.22 samples/sec Loss 5.8015 LearningRate 0.0551 Epoch: 5 Global Step: 63980 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:47,587-Speed 3316.05 samples/sec Loss 5.8071 LearningRate 0.0551 Epoch: 5 Global Step: 63990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:50,693-Speed 3297.69 samples/sec Loss 5.9080 LearningRate 0.0551 Epoch: 5 Global Step: 64000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:53,742-Speed 3359.97 samples/sec Loss 5.8459 LearningRate 0.0551 Epoch: 5 Global Step: 64010 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:56,781-Speed 3370.73 samples/sec Loss 5.8138 LearningRate 0.0551 Epoch: 5 Global Step: 64020 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:14:59,898-Speed 3285.81 samples/sec Loss 5.9341 LearningRate 0.0551 Epoch: 5 Global Step: 64030 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:02,957-Speed 3349.15 samples/sec Loss 5.7971 LearningRate 0.0551 Epoch: 5 Global Step: 64040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:05,978-Speed 3390.17 samples/sec Loss 5.8648 LearningRate 0.0551 Epoch: 5 Global Step: 64050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:08,998-Speed 3391.92 samples/sec Loss 5.8894 LearningRate 0.0551 Epoch: 5 Global Step: 64060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:12,013-Speed 3397.65 samples/sec Loss 5.7582 LearningRate 0.0551 Epoch: 5 Global Step: 64070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:15,020-Speed 3405.92 samples/sec Loss 5.8514 LearningRate 0.0551 Epoch: 5 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:15:18,086-Speed 3340.99 samples/sec Loss 5.8798 LearningRate 0.0551 Epoch: 5 Global Step: 64090 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:21,141-Speed 3353.43 samples/sec Loss 5.8904 LearningRate 0.0551 Epoch: 5 Global Step: 64100 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:24,211-Speed 3336.35 samples/sec Loss 5.8005 LearningRate 0.0550 Epoch: 5 Global Step: 64110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:27,283-Speed 3334.89 samples/sec Loss 5.8786 LearningRate 0.0550 Epoch: 5 Global Step: 64120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:30,384-Speed 3302.86 samples/sec Loss 5.9227 LearningRate 0.0550 Epoch: 5 Global Step: 64130 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:33,398-Speed 3398.14 samples/sec Loss 5.9080 LearningRate 0.0550 Epoch: 5 Global Step: 64140 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:36,474-Speed 3331.09 samples/sec Loss 5.9102 LearningRate 0.0550 Epoch: 5 Global Step: 64150 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:39,518-Speed 3364.00 samples/sec Loss 5.8885 LearningRate 0.0550 Epoch: 5 Global Step: 64160 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:42,542-Speed 3387.76 samples/sec Loss 5.7972 LearningRate 0.0550 Epoch: 5 Global Step: 64170 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:45,547-Speed 3408.53 samples/sec Loss 5.8557 LearningRate 0.0550 Epoch: 5 Global Step: 64180 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:48,574-Speed 3383.89 samples/sec Loss 5.9019 LearningRate 0.0550 Epoch: 5 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:15:51,600-Speed 3385.76 samples/sec Loss 5.8864 LearningRate 0.0550 Epoch: 5 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:15:54,666-Speed 3340.63 samples/sec Loss 5.9208 LearningRate 0.0550 Epoch: 5 Global Step: 64210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:15:57,677-Speed 3402.13 samples/sec Loss 5.8714 LearningRate 0.0550 Epoch: 5 Global Step: 64220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:00,704-Speed 3384.06 samples/sec Loss 5.8770 LearningRate 0.0550 Epoch: 5 Global Step: 64230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:03,750-Speed 3362.30 samples/sec Loss 5.8996 LearningRate 0.0550 Epoch: 5 Global Step: 64240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:06,811-Speed 3346.32 samples/sec Loss 5.9315 LearningRate 0.0550 Epoch: 5 Global Step: 64250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:09,830-Speed 3393.83 samples/sec Loss 5.9924 LearningRate 0.0550 Epoch: 5 Global Step: 64260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:12,897-Speed 3338.72 samples/sec Loss 5.8721 LearningRate 0.0550 Epoch: 5 Global Step: 64270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:15,930-Speed 3377.41 samples/sec Loss 5.8924 LearningRate 0.0549 Epoch: 5 Global Step: 64280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:18,975-Speed 3363.94 samples/sec Loss 5.9520 LearningRate 0.0549 Epoch: 5 Global Step: 64290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:22,002-Speed 3384.49 samples/sec Loss 5.7626 LearningRate 0.0549 Epoch: 5 Global Step: 64300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:25,107-Speed 3299.29 samples/sec Loss 5.9491 LearningRate 0.0549 Epoch: 5 Global Step: 64310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:28,213-Speed 3297.96 samples/sec Loss 5.9488 LearningRate 0.0549 Epoch: 5 Global Step: 64320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:31,265-Speed 3356.44 samples/sec Loss 5.8604 LearningRate 0.0549 Epoch: 5 Global Step: 64330 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:34,274-Speed 3403.54 samples/sec Loss 5.8690 LearningRate 0.0549 Epoch: 5 Global Step: 64340 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:37,350-Speed 3330.15 samples/sec Loss 5.9182 LearningRate 0.0549 Epoch: 5 Global Step: 64350 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:40,457-Speed 3297.87 samples/sec Loss 5.9299 LearningRate 0.0549 Epoch: 5 Global Step: 64360 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:43,517-Speed 3346.68 samples/sec Loss 5.8628 LearningRate 0.0549 Epoch: 5 Global Step: 64370 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:46,532-Speed 3398.34 samples/sec Loss 5.8569 LearningRate 0.0549 Epoch: 5 Global Step: 64380 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:49,615-Speed 3322.46 samples/sec Loss 6.0243 LearningRate 0.0549 Epoch: 5 Global Step: 64390 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:52,637-Speed 3389.58 samples/sec Loss 5.9527 LearningRate 0.0549 Epoch: 5 Global Step: 64400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:16:55,638-Speed 3413.38 samples/sec Loss 5.8940 LearningRate 0.0549 Epoch: 5 Global Step: 64410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:16:58,638-Speed 3413.44 samples/sec Loss 5.9349 LearningRate 0.0549 Epoch: 5 Global Step: 64420 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:17:01,693-Speed 3353.64 samples/sec Loss 5.8973 LearningRate 0.0549 Epoch: 5 Global Step: 64430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:17:04,833-Speed 3262.11 samples/sec Loss 5.9789 LearningRate 0.0548 Epoch: 5 Global Step: 64440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:17:08,574-Speed 2737.46 samples/sec Loss 5.8625 LearningRate 0.0548 Epoch: 5 Global Step: 64450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:17:11,636-Speed 3346.40 samples/sec Loss 5.9290 LearningRate 0.0548 Epoch: 5 Global Step: 64460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:17:14,698-Speed 3344.44 samples/sec Loss 5.9310 LearningRate 0.0548 Epoch: 5 Global Step: 64470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:17:17,765-Speed 3340.29 samples/sec Loss 5.8061 LearningRate 0.0548 Epoch: 5 Global Step: 64480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:17:20,810-Speed 3375.32 samples/sec Loss 5.9732 LearningRate 0.0548 Epoch: 5 Global Step: 64490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:17:23,893-Speed 3322.95 samples/sec Loss 5.9327 LearningRate 0.0548 Epoch: 5 Global Step: 64500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:17:26,925-Speed 3377.87 samples/sec Loss 5.8996 LearningRate 0.0548 Epoch: 5 Global Step: 64510 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:17:30,049-Speed 3279.09 samples/sec Loss 6.0421 LearningRate 0.0548 Epoch: 5 Global Step: 64520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:17:33,079-Speed 3380.60 samples/sec Loss 5.9474 LearningRate 0.0548 Epoch: 5 Global Step: 64530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:17:36,114-Speed 3375.16 samples/sec Loss 6.0406 LearningRate 0.0548 Epoch: 5 Global Step: 64540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:17:39,126-Speed 3401.22 samples/sec Loss 5.9291 LearningRate 0.0548 Epoch: 5 Global Step: 64550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:17:42,165-Speed 3370.89 samples/sec Loss 5.9528 LearningRate 0.0548 Epoch: 5 Global Step: 64560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:17:45,206-Speed 3367.84 samples/sec Loss 5.9671 LearningRate 0.0548 Epoch: 5 Global Step: 64570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:17:48,224-Speed 3394.69 samples/sec Loss 5.8481 LearningRate 0.0548 Epoch: 5 Global Step: 64580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:17:51,266-Speed 3367.35 samples/sec Loss 5.8957 LearningRate 0.0548 Epoch: 5 Global Step: 64590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:17:54,296-Speed 3380.68 samples/sec Loss 6.0490 LearningRate 0.0548 Epoch: 5 Global Step: 64600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:17:57,365-Speed 3337.60 samples/sec Loss 6.0325 LearningRate 0.0547 Epoch: 5 Global Step: 64610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:00,451-Speed 3319.64 samples/sec Loss 6.0484 LearningRate 0.0547 Epoch: 5 Global Step: 64620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:03,482-Speed 3379.35 samples/sec Loss 5.9946 LearningRate 0.0547 Epoch: 5 Global Step: 64630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:06,587-Speed 3299.18 samples/sec Loss 6.0318 LearningRate 0.0547 Epoch: 5 Global Step: 64640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:09,621-Speed 3376.24 samples/sec Loss 6.0229 LearningRate 0.0547 Epoch: 5 Global Step: 64650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:12,755-Speed 3268.34 samples/sec Loss 6.0024 LearningRate 0.0547 Epoch: 5 Global Step: 64660 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:15,794-Speed 3370.54 samples/sec Loss 6.0078 LearningRate 0.0547 Epoch: 5 Global Step: 64670 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:18,815-Speed 3390.36 samples/sec Loss 5.9626 LearningRate 0.0547 Epoch: 5 Global Step: 64680 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:21,850-Speed 3375.49 samples/sec Loss 6.0703 LearningRate 0.0547 Epoch: 5 Global Step: 64690 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:24,921-Speed 3335.98 samples/sec Loss 6.0077 LearningRate 0.0547 Epoch: 5 Global Step: 64700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:18:28,036-Speed 3287.70 samples/sec Loss 5.9771 LearningRate 0.0547 Epoch: 5 Global Step: 64710 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:31,102-Speed 3340.66 samples/sec Loss 5.9661 LearningRate 0.0547 Epoch: 5 Global Step: 64720 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:34,143-Speed 3368.77 samples/sec Loss 5.9433 LearningRate 0.0547 Epoch: 5 Global Step: 64730 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:37,210-Speed 3340.10 samples/sec Loss 5.9016 LearningRate 0.0547 Epoch: 5 Global Step: 64740 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:40,315-Speed 3298.64 samples/sec Loss 6.0572 LearningRate 0.0547 Epoch: 5 Global Step: 64750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:44,707-Speed 2331.93 samples/sec Loss 5.9560 LearningRate 0.0547 Epoch: 5 Global Step: 64760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:47,744-Speed 3372.91 samples/sec Loss 5.9409 LearningRate 0.0547 Epoch: 5 Global Step: 64770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:50,779-Speed 3375.55 samples/sec Loss 5.9418 LearningRate 0.0546 Epoch: 5 Global Step: 64780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:53,814-Speed 3375.13 samples/sec Loss 5.9714 LearningRate 0.0546 Epoch: 5 Global Step: 64790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:56,889-Speed 3331.06 samples/sec Loss 6.0589 LearningRate 0.0546 Epoch: 5 Global Step: 64800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:18:59,956-Speed 3339.82 samples/sec Loss 6.0328 LearningRate 0.0546 Epoch: 5 Global Step: 64810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:19:02,999-Speed 3365.61 samples/sec Loss 5.9650 LearningRate 0.0546 Epoch: 5 Global Step: 64820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:19:06,019-Speed 3392.66 samples/sec Loss 5.9039 LearningRate 0.0546 Epoch: 5 Global Step: 64830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:19:09,029-Speed 3402.80 samples/sec Loss 6.0495 LearningRate 0.0546 Epoch: 5 Global Step: 64840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:19:12,135-Speed 3297.86 samples/sec Loss 6.0371 LearningRate 0.0546 Epoch: 5 Global Step: 64850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:19:15,202-Speed 3339.99 samples/sec Loss 6.0546 LearningRate 0.0546 Epoch: 5 Global Step: 64860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:19:18,234-Speed 3379.44 samples/sec Loss 5.9892 LearningRate 0.0546 Epoch: 5 Global Step: 64870 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:19:21,276-Speed 3367.01 samples/sec Loss 6.0765 LearningRate 0.0546 Epoch: 5 Global Step: 64880 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:19:24,317-Speed 3367.75 samples/sec Loss 6.1312 LearningRate 0.0546 Epoch: 5 Global Step: 64890 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:19:27,388-Speed 3335.97 samples/sec Loss 6.0110 LearningRate 0.0546 Epoch: 5 Global Step: 64900 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:19:30,448-Speed 3347.60 samples/sec Loss 5.9957 LearningRate 0.0546 Epoch: 5 Global Step: 64910 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:19:33,475-Speed 3383.83 samples/sec Loss 6.0131 LearningRate 0.0546 Epoch: 5 Global Step: 64920 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:19:36,526-Speed 3357.90 samples/sec Loss 6.1356 LearningRate 0.0546 Epoch: 5 Global Step: 64930 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:19:39,539-Speed 3399.78 samples/sec Loss 5.9658 LearningRate 0.0546 Epoch: 5 Global Step: 64940 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:19:42,581-Speed 3366.85 samples/sec Loss 5.8942 LearningRate 0.0545 Epoch: 5 Global Step: 64950 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:19:45,590-Speed 3404.84 samples/sec Loss 6.1013 LearningRate 0.0545 Epoch: 5 Global Step: 64960 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:19:48,611-Speed 3390.23 samples/sec Loss 5.9798 LearningRate 0.0545 Epoch: 5 Global Step: 64970 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:19:51,726-Speed 3288.65 samples/sec Loss 6.0727 LearningRate 0.0545 Epoch: 5 Global Step: 64980 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:19:54,768-Speed 3367.40 samples/sec Loss 6.0536 LearningRate 0.0545 Epoch: 5 Global Step: 64990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:19:57,773-Speed 3408.68 samples/sec Loss 6.0088 LearningRate 0.0545 Epoch: 5 Global Step: 65000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:20:00,824-Speed 3357.72 samples/sec Loss 5.9403 LearningRate 0.0545 Epoch: 5 Global Step: 65010 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:20:03,876-Speed 3355.38 samples/sec Loss 5.9189 LearningRate 0.0545 Epoch: 5 Global Step: 65020 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:20:06,995-Speed 3284.34 samples/sec Loss 5.9770 LearningRate 0.0545 Epoch: 5 Global Step: 65030 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:20:10,016-Speed 3390.50 samples/sec Loss 6.0874 LearningRate 0.0545 Epoch: 5 Global Step: 65040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:20:13,036-Speed 3392.38 samples/sec Loss 6.0338 LearningRate 0.0545 Epoch: 5 Global Step: 65050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:20:16,056-Speed 3391.94 samples/sec Loss 6.1107 LearningRate 0.0545 Epoch: 5 Global Step: 65060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:20:19,189-Speed 3269.23 samples/sec Loss 5.9650 LearningRate 0.0545 Epoch: 5 Global Step: 65070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:20:22,218-Speed 3381.52 samples/sec Loss 6.0408 LearningRate 0.0545 Epoch: 5 Global Step: 65080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:20:25,239-Speed 3391.42 samples/sec Loss 6.0673 LearningRate 0.0545 Epoch: 5 Global Step: 65090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:20:28,320-Speed 3324.13 samples/sec Loss 6.0891 LearningRate 0.0545 Epoch: 5 Global Step: 65100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:20:31,345-Speed 3386.11 samples/sec Loss 6.0174 LearningRate 0.0545 Epoch: 5 Global Step: 65110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:20:34,368-Speed 3388.97 samples/sec Loss 6.1173 LearningRate 0.0544 Epoch: 5 Global Step: 65120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:20:37,438-Speed 3337.33 samples/sec Loss 6.0837 LearningRate 0.0544 Epoch: 5 Global Step: 65130 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:20:40,494-Speed 3352.11 samples/sec Loss 6.0502 LearningRate 0.0544 Epoch: 5 Global Step: 65140 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:20:43,571-Speed 3328.73 samples/sec Loss 6.0238 LearningRate 0.0544 Epoch: 5 Global Step: 65150 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:20:46,604-Speed 3377.24 samples/sec Loss 5.9785 LearningRate 0.0544 Epoch: 5 Global Step: 65160 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:20:49,608-Speed 3410.07 samples/sec Loss 5.9848 LearningRate 0.0544 Epoch: 5 Global Step: 65170 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:20:52,681-Speed 3332.75 samples/sec Loss 6.1200 LearningRate 0.0544 Epoch: 5 Global Step: 65180 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:20:55,740-Speed 3348.86 samples/sec Loss 6.0456 LearningRate 0.0544 Epoch: 5 Global Step: 65190 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:20:58,775-Speed 3375.26 samples/sec Loss 6.0508 LearningRate 0.0544 Epoch: 5 Global Step: 65200 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:21:01,845-Speed 3336.30 samples/sec Loss 6.0257 LearningRate 0.0544 Epoch: 5 Global Step: 65210 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:21:04,941-Speed 3308.82 samples/sec Loss 6.0472 LearningRate 0.0544 Epoch: 5 Global Step: 65220 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:21:07,960-Speed 3393.62 samples/sec Loss 6.1333 LearningRate 0.0544 Epoch: 5 Global Step: 65230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:21:11,067-Speed 3296.50 samples/sec Loss 6.0647 LearningRate 0.0544 Epoch: 5 Global Step: 65240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:21:14,079-Speed 3400.90 samples/sec Loss 6.0447 LearningRate 0.0544 Epoch: 5 Global Step: 65250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:21:17,166-Speed 3318.38 samples/sec Loss 6.1089 LearningRate 0.0544 Epoch: 5 Global Step: 65260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:21:20,232-Speed 3341.24 samples/sec Loss 6.0565 LearningRate 0.0544 Epoch: 5 Global Step: 65270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:21:23,308-Speed 3329.20 samples/sec Loss 6.0478 LearningRate 0.0543 Epoch: 5 Global Step: 65280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:21:26,435-Speed 3275.70 samples/sec Loss 6.1717 LearningRate 0.0543 Epoch: 5 Global Step: 65290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:21:29,466-Speed 3379.35 samples/sec Loss 6.1391 LearningRate 0.0543 Epoch: 5 Global Step: 65300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:21:32,549-Speed 3322.43 samples/sec Loss 6.0033 LearningRate 0.0543 Epoch: 5 Global Step: 65310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:21:35,615-Speed 3341.31 samples/sec Loss 6.1899 LearningRate 0.0543 Epoch: 5 Global Step: 65320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:21:38,650-Speed 3375.24 samples/sec Loss 6.1413 LearningRate 0.0543 Epoch: 5 Global Step: 65330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:21:41,698-Speed 3360.11 samples/sec Loss 6.0502 LearningRate 0.0543 Epoch: 5 Global Step: 65340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:21:44,709-Speed 3402.04 samples/sec Loss 6.0085 LearningRate 0.0543 Epoch: 5 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:21:47,782-Speed 3333.53 samples/sec Loss 6.1223 LearningRate 0.0543 Epoch: 5 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:21:50,874-Speed 3313.06 samples/sec Loss 6.1045 LearningRate 0.0543 Epoch: 5 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:21:54,048-Speed 3227.62 samples/sec Loss 5.9279 LearningRate 0.0543 Epoch: 5 Global Step: 65380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:21:57,059-Speed 3402.06 samples/sec Loss 6.1190 LearningRate 0.0543 Epoch: 5 Global Step: 65390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:22:00,168-Speed 3294.86 samples/sec Loss 6.0767 LearningRate 0.0543 Epoch: 5 Global Step: 65400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:22:03,260-Speed 3312.78 samples/sec Loss 6.0935 LearningRate 0.0543 Epoch: 5 Global Step: 65410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:22:06,374-Speed 3289.70 samples/sec Loss 6.0256 LearningRate 0.0543 Epoch: 5 Global Step: 65420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:22:09,360-Speed 3430.16 samples/sec Loss 6.0602 LearningRate 0.0543 Epoch: 5 Global Step: 65430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:22:12,422-Speed 3345.81 samples/sec Loss 5.9502 LearningRate 0.0543 Epoch: 5 Global Step: 65440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:15,494-Speed 3333.17 samples/sec Loss 6.0160 LearningRate 0.0542 Epoch: 5 Global Step: 65450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:18,555-Speed 3346.69 samples/sec Loss 6.0671 LearningRate 0.0542 Epoch: 5 Global Step: 65460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:21,599-Speed 3365.52 samples/sec Loss 6.0538 LearningRate 0.0542 Epoch: 5 Global Step: 65470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:24,645-Speed 3362.30 samples/sec Loss 6.0769 LearningRate 0.0542 Epoch: 5 Global Step: 65480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:27,719-Speed 3332.97 samples/sec Loss 6.1387 LearningRate 0.0542 Epoch: 5 Global Step: 65490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:30,787-Speed 3338.01 samples/sec Loss 6.0685 LearningRate 0.0542 Epoch: 5 Global Step: 65500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:33,829-Speed 3367.98 samples/sec Loss 6.0988 LearningRate 0.0542 Epoch: 5 Global Step: 65510 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:36,886-Speed 3350.20 samples/sec Loss 6.0836 LearningRate 0.0542 Epoch: 5 Global Step: 65520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:39,922-Speed 3374.36 samples/sec Loss 6.0537 LearningRate 0.0542 Epoch: 5 Global Step: 65530 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:42,962-Speed 3369.23 samples/sec Loss 6.0407 LearningRate 0.0542 Epoch: 5 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:22:45,967-Speed 3408.83 samples/sec Loss 6.1810 LearningRate 0.0542 Epoch: 5 Global Step: 65550 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:49,053-Speed 3318.55 samples/sec Loss 6.0670 LearningRate 0.0542 Epoch: 5 Global Step: 65560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:52,195-Speed 3260.11 samples/sec Loss 6.1185 LearningRate 0.0542 Epoch: 5 Global Step: 65570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:55,240-Speed 3364.74 samples/sec Loss 6.0157 LearningRate 0.0542 Epoch: 5 Global Step: 65580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:22:58,285-Speed 3363.87 samples/sec Loss 6.1088 LearningRate 0.0542 Epoch: 5 Global Step: 65590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:23:01,359-Speed 3331.94 samples/sec Loss 6.1867 LearningRate 0.0542 Epoch: 5 Global Step: 65600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:23:04,429-Speed 3337.17 samples/sec Loss 6.1124 LearningRate 0.0542 Epoch: 5 Global Step: 65610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:23:07,497-Speed 3337.94 samples/sec Loss 6.1147 LearningRate 0.0541 Epoch: 5 Global Step: 65620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:23:10,526-Speed 3382.16 samples/sec Loss 6.0163 LearningRate 0.0541 Epoch: 5 Global Step: 65630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:23:13,563-Speed 3373.10 samples/sec Loss 6.1767 LearningRate 0.0541 Epoch: 5 Global Step: 65640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:23:16,720-Speed 3243.83 samples/sec Loss 6.0803 LearningRate 0.0541 Epoch: 5 Global Step: 65650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:19,786-Speed 3340.81 samples/sec Loss 6.1771 LearningRate 0.0541 Epoch: 5 Global Step: 65660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:22,806-Speed 3391.87 samples/sec Loss 6.1521 LearningRate 0.0541 Epoch: 5 Global Step: 65670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:25,837-Speed 3379.77 samples/sec Loss 6.1209 LearningRate 0.0541 Epoch: 5 Global Step: 65680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:28,946-Speed 3294.88 samples/sec Loss 6.0803 LearningRate 0.0541 Epoch: 5 Global Step: 65690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:32,004-Speed 3349.88 samples/sec Loss 6.1031 LearningRate 0.0541 Epoch: 5 Global Step: 65700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:35,037-Speed 3377.58 samples/sec Loss 6.0247 LearningRate 0.0541 Epoch: 5 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:38,119-Speed 3323.32 samples/sec Loss 5.9756 LearningRate 0.0541 Epoch: 5 Global Step: 65720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:41,146-Speed 3384.20 samples/sec Loss 6.0334 LearningRate 0.0541 Epoch: 5 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:44,154-Speed 3404.70 samples/sec Loss 6.0369 LearningRate 0.0541 Epoch: 5 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:47,150-Speed 3419.61 samples/sec Loss 5.9943 LearningRate 0.0541 Epoch: 5 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:50,220-Speed 3337.20 samples/sec Loss 6.1388 LearningRate 0.0541 Epoch: 5 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:23:53,298-Speed 3327.69 samples/sec Loss 6.0806 LearningRate 0.0541 Epoch: 5 Global Step: 65770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:23:56,368-Speed 3336.28 samples/sec Loss 6.2331 LearningRate 0.0541 Epoch: 5 Global Step: 65780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:23:59,404-Speed 3374.12 samples/sec Loss 6.0297 LearningRate 0.0540 Epoch: 5 Global Step: 65790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:02,459-Speed 3352.92 samples/sec Loss 6.2841 LearningRate 0.0540 Epoch: 5 Global Step: 65800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:05,498-Speed 3370.67 samples/sec Loss 6.1594 LearningRate 0.0540 Epoch: 5 Global Step: 65810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:08,549-Speed 3356.85 samples/sec Loss 6.1946 LearningRate 0.0540 Epoch: 5 Global Step: 65820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:11,570-Speed 3391.27 samples/sec Loss 6.0708 LearningRate 0.0540 Epoch: 5 Global Step: 65830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:14,616-Speed 3362.97 samples/sec Loss 6.1647 LearningRate 0.0540 Epoch: 5 Global Step: 65840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:17,668-Speed 3356.38 samples/sec Loss 6.0359 LearningRate 0.0540 Epoch: 5 Global Step: 65850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:20,699-Speed 3379.01 samples/sec Loss 6.1263 LearningRate 0.0540 Epoch: 5 Global Step: 65860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:23,752-Speed 3355.89 samples/sec Loss 6.2322 LearningRate 0.0540 Epoch: 5 Global Step: 65870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:24:26,775-Speed 3387.80 samples/sec Loss 6.0872 LearningRate 0.0540 Epoch: 5 Global Step: 65880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:24:29,845-Speed 3336.91 samples/sec Loss 6.1177 LearningRate 0.0540 Epoch: 5 Global Step: 65890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:24:32,840-Speed 3420.15 samples/sec Loss 6.1473 LearningRate 0.0540 Epoch: 5 Global Step: 65900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:24:35,908-Speed 3339.01 samples/sec Loss 5.9860 LearningRate 0.0540 Epoch: 5 Global Step: 65910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:24:38,973-Speed 3341.88 samples/sec Loss 6.1003 LearningRate 0.0540 Epoch: 5 Global Step: 65920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:24:42,063-Speed 3314.84 samples/sec Loss 6.1906 LearningRate 0.0540 Epoch: 5 Global Step: 65930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:24:45,061-Speed 3417.64 samples/sec Loss 6.0547 LearningRate 0.0540 Epoch: 5 Global Step: 65940 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:48,061-Speed 3413.74 samples/sec Loss 6.0985 LearningRate 0.0540 Epoch: 5 Global Step: 65950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:51,079-Speed 3393.84 samples/sec Loss 5.9440 LearningRate 0.0539 Epoch: 5 Global Step: 65960 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:54,111-Speed 3379.25 samples/sec Loss 6.1316 LearningRate 0.0539 Epoch: 5 Global Step: 65970 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:24:57,148-Speed 3373.21 samples/sec Loss 6.1076 LearningRate 0.0539 Epoch: 5 Global Step: 65980 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:25:00,166-Speed 3393.53 samples/sec Loss 6.1202 LearningRate 0.0539 Epoch: 5 Global Step: 65990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:25:03,206-Speed 3369.78 samples/sec Loss 6.1849 LearningRate 0.0539 Epoch: 5 Global Step: 66000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:25:06,229-Speed 3388.05 samples/sec Loss 6.1246 LearningRate 0.0539 Epoch: 5 Global Step: 66010 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:25:09,226-Speed 3417.77 samples/sec Loss 6.1298 LearningRate 0.0539 Epoch: 5 Global Step: 66020 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:25:12,243-Speed 3395.85 samples/sec Loss 6.0528 LearningRate 0.0539 Epoch: 5 Global Step: 66030 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:25:15,254-Speed 3401.60 samples/sec Loss 6.0633 LearningRate 0.0539 Epoch: 5 Global Step: 66040 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:25:18,332-Speed 3327.63 samples/sec Loss 6.1680 LearningRate 0.0539 Epoch: 5 Global Step: 66050 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:25:21,353-Speed 3390.70 samples/sec Loss 6.1983 LearningRate 0.0539 Epoch: 5 Global Step: 66060 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:25:24,375-Speed 3389.71 samples/sec Loss 6.1407 LearningRate 0.0539 Epoch: 5 Global Step: 66070 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:25:27,425-Speed 3358.79 samples/sec Loss 6.1767 LearningRate 0.0539 Epoch: 5 Global Step: 66080 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:25:30,461-Speed 3373.93 samples/sec Loss 6.1132 LearningRate 0.0539 Epoch: 5 Global Step: 66090 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:25:33,478-Speed 3394.77 samples/sec Loss 6.1451 LearningRate 0.0539 Epoch: 5 Global Step: 66100 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:25:36,539-Speed 3346.27 samples/sec Loss 6.2297 LearningRate 0.0539 Epoch: 5 Global Step: 66110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:25:39,555-Speed 3396.57 samples/sec Loss 6.1659 LearningRate 0.0539 Epoch: 5 Global Step: 66120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:25:42,560-Speed 3408.14 samples/sec Loss 6.1660 LearningRate 0.0538 Epoch: 5 Global Step: 66130 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:25:45,603-Speed 3366.63 samples/sec Loss 6.2017 LearningRate 0.0538 Epoch: 5 Global Step: 66140 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:25:48,639-Speed 3373.80 samples/sec Loss 6.1808 LearningRate 0.0538 Epoch: 5 Global Step: 66150 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:25:51,716-Speed 3328.98 samples/sec Loss 5.9989 LearningRate 0.0538 Epoch: 5 Global Step: 66160 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:25:54,749-Speed 3377.11 samples/sec Loss 6.0838 LearningRate 0.0538 Epoch: 5 Global Step: 66170 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:25:57,756-Speed 3406.61 samples/sec Loss 6.1997 LearningRate 0.0538 Epoch: 5 Global Step: 66180 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:00,781-Speed 3386.11 samples/sec Loss 6.1983 LearningRate 0.0538 Epoch: 5 Global Step: 66190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:03,847-Speed 3341.28 samples/sec Loss 6.2310 LearningRate 0.0538 Epoch: 5 Global Step: 66200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:06,881-Speed 3377.05 samples/sec Loss 6.1564 LearningRate 0.0538 Epoch: 5 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:26:09,899-Speed 3393.78 samples/sec Loss 6.2604 LearningRate 0.0538 Epoch: 5 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:26:12,904-Speed 3407.96 samples/sec Loss 6.2134 LearningRate 0.0538 Epoch: 5 Global Step: 66230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:15,971-Speed 3340.37 samples/sec Loss 6.2172 LearningRate 0.0538 Epoch: 5 Global Step: 66240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:19,092-Speed 3282.11 samples/sec Loss 6.1359 LearningRate 0.0538 Epoch: 5 Global Step: 66250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:22,105-Speed 3398.88 samples/sec Loss 6.2632 LearningRate 0.0538 Epoch: 5 Global Step: 66260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:25,221-Speed 3288.24 samples/sec Loss 6.2352 LearningRate 0.0538 Epoch: 5 Global Step: 66270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:28,317-Speed 3308.12 samples/sec Loss 6.1892 LearningRate 0.0538 Epoch: 5 Global Step: 66280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:31,371-Speed 3353.69 samples/sec Loss 6.0791 LearningRate 0.0538 Epoch: 5 Global Step: 66290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:34,395-Speed 3387.63 samples/sec Loss 6.1722 LearningRate 0.0537 Epoch: 5 Global Step: 66300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:37,431-Speed 3373.70 samples/sec Loss 6.0884 LearningRate 0.0537 Epoch: 5 Global Step: 66310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:40,495-Speed 3342.82 samples/sec Loss 6.1583 LearningRate 0.0537 Epoch: 5 Global Step: 66320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:43,516-Speed 3391.02 samples/sec Loss 6.0961 LearningRate 0.0537 Epoch: 5 Global Step: 66330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:26:46,582-Speed 3341.01 samples/sec Loss 6.1132 LearningRate 0.0537 Epoch: 5 Global Step: 66340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:26:49,625-Speed 3366.47 samples/sec Loss 6.2687 LearningRate 0.0537 Epoch: 5 Global Step: 66350 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:52,733-Speed 3295.99 samples/sec Loss 6.1666 LearningRate 0.0537 Epoch: 5 Global Step: 66360 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:55,794-Speed 3345.68 samples/sec Loss 6.0968 LearningRate 0.0537 Epoch: 5 Global Step: 66370 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:26:58,839-Speed 3364.15 samples/sec Loss 6.1185 LearningRate 0.0537 Epoch: 5 Global Step: 66380 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:27:01,938-Speed 3305.94 samples/sec Loss 6.1602 LearningRate 0.0537 Epoch: 5 Global Step: 66390 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:27:05,014-Speed 3329.33 samples/sec Loss 6.1822 LearningRate 0.0537 Epoch: 5 Global Step: 66400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:27:08,050-Speed 3374.97 samples/sec Loss 6.2206 LearningRate 0.0537 Epoch: 5 Global Step: 66410 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:27:11,105-Speed 3352.12 samples/sec Loss 6.3034 LearningRate 0.0537 Epoch: 5 Global Step: 66420 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:27:14,161-Speed 3352.07 samples/sec Loss 6.1667 LearningRate 0.0537 Epoch: 5 Global Step: 66430 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:27:17,197-Speed 3374.32 samples/sec Loss 6.2454 LearningRate 0.0537 Epoch: 5 Global Step: 66440 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:27:20,203-Speed 3407.67 samples/sec Loss 6.2082 LearningRate 0.0537 Epoch: 5 Global Step: 66450 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:27:23,230-Speed 3383.33 samples/sec Loss 6.1518 LearningRate 0.0537 Epoch: 5 Global Step: 66460 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:27:26,303-Speed 3333.10 samples/sec Loss 6.2230 LearningRate 0.0536 Epoch: 5 Global Step: 66470 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:27:29,428-Speed 3277.76 samples/sec Loss 6.1834 LearningRate 0.0536 Epoch: 5 Global Step: 66480 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:27:32,496-Speed 3339.13 samples/sec Loss 6.1844 LearningRate 0.0536 Epoch: 5 Global Step: 66490 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:27:35,545-Speed 3359.84 samples/sec Loss 6.1649 LearningRate 0.0536 Epoch: 5 Global Step: 66500 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:27:38,546-Speed 3412.75 samples/sec Loss 6.1532 LearningRate 0.0536 Epoch: 5 Global Step: 66510 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-04-27 07:27:41,600-Speed 3354.19 samples/sec Loss 6.2266 LearningRate 0.0536 Epoch: 5 Global Step: 66520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:27:44,620-Speed 3391.11 samples/sec Loss 6.2143 LearningRate 0.0536 Epoch: 5 Global Step: 66530 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:27:47,710-Speed 3314.70 samples/sec Loss 6.2982 LearningRate 0.0536 Epoch: 5 Global Step: 66540 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:27:50,794-Speed 3322.62 samples/sec Loss 6.1839 LearningRate 0.0536 Epoch: 5 Global Step: 66550 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:27:53,818-Speed 3386.85 samples/sec Loss 6.1504 LearningRate 0.0536 Epoch: 5 Global Step: 66560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:27:56,841-Speed 3388.86 samples/sec Loss 6.2166 LearningRate 0.0536 Epoch: 5 Global Step: 66570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:27:59,865-Speed 3386.51 samples/sec Loss 6.1423 LearningRate 0.0536 Epoch: 5 Global Step: 66580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:02,963-Speed 3307.28 samples/sec Loss 6.2773 LearningRate 0.0536 Epoch: 5 Global Step: 66590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:06,021-Speed 3348.74 samples/sec Loss 6.2048 LearningRate 0.0536 Epoch: 5 Global Step: 66600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:09,051-Speed 3381.64 samples/sec Loss 6.2172 LearningRate 0.0536 Epoch: 5 Global Step: 66610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:12,115-Speed 3342.61 samples/sec Loss 6.0876 LearningRate 0.0536 Epoch: 5 Global Step: 66620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:28:15,148-Speed 3377.47 samples/sec Loss 6.1779 LearningRate 0.0536 Epoch: 5 Global Step: 66630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:18,239-Speed 3313.69 samples/sec Loss 6.1539 LearningRate 0.0535 Epoch: 5 Global Step: 66640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:21,250-Speed 3401.79 samples/sec Loss 6.2150 LearningRate 0.0535 Epoch: 5 Global Step: 66650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:24,324-Speed 3332.86 samples/sec Loss 6.2785 LearningRate 0.0535 Epoch: 5 Global Step: 66660 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:27,414-Speed 3314.38 samples/sec Loss 6.1714 LearningRate 0.0535 Epoch: 5 Global Step: 66670 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:30,439-Speed 3386.12 samples/sec Loss 6.2559 LearningRate 0.0535 Epoch: 5 Global Step: 66680 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:33,450-Speed 3402.16 samples/sec Loss 6.2712 LearningRate 0.0535 Epoch: 5 Global Step: 66690 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:36,470-Speed 3392.20 samples/sec Loss 6.2290 LearningRate 0.0535 Epoch: 5 Global Step: 66700 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:39,513-Speed 3365.80 samples/sec Loss 6.2252 LearningRate 0.0535 Epoch: 5 Global Step: 66710 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:42,547-Speed 3375.75 samples/sec Loss 6.1520 LearningRate 0.0535 Epoch: 5 Global Step: 66720 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:45,645-Speed 3307.19 samples/sec Loss 6.3824 LearningRate 0.0535 Epoch: 5 Global Step: 66730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:28:48,703-Speed 3349.14 samples/sec Loss 6.1618 LearningRate 0.0535 Epoch: 5 Global Step: 66740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:28:51,805-Speed 3302.67 samples/sec Loss 6.2024 LearningRate 0.0535 Epoch: 5 Global Step: 66750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:54,843-Speed 3372.00 samples/sec Loss 6.2035 LearningRate 0.0535 Epoch: 5 Global Step: 66760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:28:57,850-Speed 3406.43 samples/sec Loss 6.1865 LearningRate 0.0535 Epoch: 5 Global Step: 66770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:29:00,968-Speed 3284.98 samples/sec Loss 6.0738 LearningRate 0.0535 Epoch: 5 Global Step: 66780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:29:04,068-Speed 3303.76 samples/sec Loss 6.2731 LearningRate 0.0535 Epoch: 5 Global Step: 66790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:29:07,085-Speed 3395.90 samples/sec Loss 6.2265 LearningRate 0.0535 Epoch: 5 Global Step: 66800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:29:10,172-Speed 3318.20 samples/sec Loss 6.1621 LearningRate 0.0534 Epoch: 5 Global Step: 66810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:29:13,194-Speed 3389.73 samples/sec Loss 6.0911 LearningRate 0.0534 Epoch: 5 Global Step: 66820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:29:16,285-Speed 3312.83 samples/sec Loss 6.1665 LearningRate 0.0534 Epoch: 5 Global Step: 66830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:29:19,351-Speed 3340.97 samples/sec Loss 6.1996 LearningRate 0.0534 Epoch: 5 Global Step: 66840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:29:22,401-Speed 3359.31 samples/sec Loss 6.1680 LearningRate 0.0534 Epoch: 5 Global Step: 66850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:29:25,449-Speed 3359.72 samples/sec Loss 6.2214 LearningRate 0.0534 Epoch: 5 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 07:29:28,502-Speed 3355.38 samples/sec Loss 6.2874 LearningRate 0.0534 Epoch: 5 Global Step: 66870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:29:31,539-Speed 3372.96 samples/sec Loss 6.2625 LearningRate 0.0534 Epoch: 5 Global Step: 66880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 07:29:34,568-Speed 3382.15 samples/sec Loss 6.2071 LearningRate 0.0534 Epoch: 5 Global Step: 66890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:29:37,616-Speed 3360.44 samples/sec Loss 6.3356 LearningRate 0.0534 Epoch: 5 Global Step: 66900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:29:40,660-Speed 3364.94 samples/sec Loss 6.2919 LearningRate 0.0534 Epoch: 5 Global Step: 66910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:29:43,670-Speed 3403.63 samples/sec Loss 6.2088 LearningRate 0.0534 Epoch: 5 Global Step: 66920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:29:46,693-Speed 3388.16 samples/sec Loss 6.2151 LearningRate 0.0534 Epoch: 5 Global Step: 66930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:29:49,749-Speed 3352.24 samples/sec Loss 6.1837 LearningRate 0.0534 Epoch: 5 Global Step: 66940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:29:52,794-Speed 3363.76 samples/sec Loss 6.2396 LearningRate 0.0534 Epoch: 5 Global Step: 66950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:29:55,865-Speed 3335.84 samples/sec Loss 6.2086 LearningRate 0.0534 Epoch: 5 Global Step: 66960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:29:58,923-Speed 3348.92 samples/sec Loss 6.1671 LearningRate 0.0534 Epoch: 5 Global Step: 66970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:30:01,997-Speed 3332.78 samples/sec Loss 6.1575 LearningRate 0.0533 Epoch: 5 Global Step: 66980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:30:05,086-Speed 3315.47 samples/sec Loss 6.1871 LearningRate 0.0533 Epoch: 5 Global Step: 66990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:30:08,141-Speed 3353.04 samples/sec Loss 6.1867 LearningRate 0.0533 Epoch: 5 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:30:11,199-Speed 3350.61 samples/sec Loss 6.2537 LearningRate 0.0533 Epoch: 5 Global Step: 67010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:30:14,318-Speed 3284.03 samples/sec Loss 6.2322 LearningRate 0.0533 Epoch: 5 Global Step: 67020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:30:17,388-Speed 3335.91 samples/sec Loss 6.3152 LearningRate 0.0533 Epoch: 5 Global Step: 67030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:30:20,406-Speed 3394.51 samples/sec Loss 6.1286 LearningRate 0.0533 Epoch: 5 Global Step: 67040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:30:23,437-Speed 3378.45 samples/sec Loss 6.2097 LearningRate 0.0533 Epoch: 5 Global Step: 67050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:30:26,527-Speed 3315.22 samples/sec Loss 6.2158 LearningRate 0.0533 Epoch: 5 Global Step: 67060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:30:29,541-Speed 3399.35 samples/sec Loss 6.1462 LearningRate 0.0533 Epoch: 5 Global Step: 67070 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:30:32,593-Speed 3356.01 samples/sec Loss 6.3082 LearningRate 0.0533 Epoch: 5 Global Step: 67080 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:30:35,709-Speed 3286.45 samples/sec Loss 6.2772 LearningRate 0.0533 Epoch: 5 Global Step: 67090 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:30:38,767-Speed 3350.01 samples/sec Loss 6.2714 LearningRate 0.0533 Epoch: 5 Global Step: 67100 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:30:41,871-Speed 3300.96 samples/sec Loss 6.3300 LearningRate 0.0533 Epoch: 5 Global Step: 67110 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:30:44,898-Speed 3383.24 samples/sec Loss 6.3215 LearningRate 0.0533 Epoch: 5 Global Step: 67120 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:30:47,923-Speed 3386.57 samples/sec Loss 6.2500 LearningRate 0.0533 Epoch: 5 Global Step: 67130 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:30:50,950-Speed 3384.65 samples/sec Loss 6.2672 LearningRate 0.0533 Epoch: 5 Global Step: 67140 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:30:54,093-Speed 3258.12 samples/sec Loss 6.1589 LearningRate 0.0532 Epoch: 5 Global Step: 67150 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:30:57,125-Speed 3378.93 samples/sec Loss 6.2826 LearningRate 0.0532 Epoch: 5 Global Step: 67160 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:31:00,222-Speed 3307.41 samples/sec Loss 6.2293 LearningRate 0.0532 Epoch: 5 Global Step: 67170 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:31:03,257-Speed 3374.79 samples/sec Loss 6.2123 LearningRate 0.0532 Epoch: 5 Global Step: 67180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:31:06,280-Speed 3389.38 samples/sec Loss 6.1719 LearningRate 0.0532 Epoch: 5 Global Step: 67190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:31:09,323-Speed 3366.56 samples/sec Loss 6.3218 LearningRate 0.0532 Epoch: 5 Global Step: 67200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:31:12,360-Speed 3372.51 samples/sec Loss 6.2359 LearningRate 0.0532 Epoch: 5 Global Step: 67210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:31:15,413-Speed 3355.07 samples/sec Loss 6.2636 LearningRate 0.0532 Epoch: 5 Global Step: 67220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:31:18,456-Speed 3366.19 samples/sec Loss 6.1598 LearningRate 0.0532 Epoch: 5 Global Step: 67230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:31:21,481-Speed 3386.27 samples/sec Loss 6.2760 LearningRate 0.0532 Epoch: 5 Global Step: 67240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:31:24,542-Speed 3345.77 samples/sec Loss 6.2011 LearningRate 0.0532 Epoch: 5 Global Step: 67250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:31:27,587-Speed 3364.23 samples/sec Loss 6.1690 LearningRate 0.0532 Epoch: 5 Global Step: 67260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:31:30,721-Speed 3268.42 samples/sec Loss 6.2495 LearningRate 0.0532 Epoch: 5 Global Step: 67270 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:31:33,741-Speed 3392.50 samples/sec Loss 6.2290 LearningRate 0.0532 Epoch: 5 Global Step: 67280 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:31:36,827-Speed 3319.48 samples/sec Loss 6.2228 LearningRate 0.0532 Epoch: 5 Global Step: 67290 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:31:39,905-Speed 3326.77 samples/sec Loss 6.1319 LearningRate 0.0532 Epoch: 5 Global Step: 67300 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:31:42,997-Speed 3313.49 samples/sec Loss 6.2546 LearningRate 0.0532 Epoch: 5 Global Step: 67310 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:31:46,028-Speed 3379.09 samples/sec Loss 6.1666 LearningRate 0.0531 Epoch: 5 Global Step: 67320 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:31:49,041-Speed 3400.15 samples/sec Loss 6.3304 LearningRate 0.0531 Epoch: 5 Global Step: 67330 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:31:52,087-Speed 3363.46 samples/sec Loss 6.1649 LearningRate 0.0531 Epoch: 5 Global Step: 67340 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:31:55,145-Speed 3349.26 samples/sec Loss 6.1743 LearningRate 0.0531 Epoch: 5 Global Step: 67350 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:31:58,173-Speed 3383.06 samples/sec Loss 6.3077 LearningRate 0.0531 Epoch: 5 Global Step: 67360 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:32:01,184-Speed 3402.15 samples/sec Loss 6.1317 LearningRate 0.0531 Epoch: 5 Global Step: 67370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:04,191-Speed 3405.99 samples/sec Loss 6.1742 LearningRate 0.0531 Epoch: 5 Global Step: 67380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:07,381-Speed 3211.16 samples/sec Loss 6.2969 LearningRate 0.0531 Epoch: 5 Global Step: 67390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:10,416-Speed 3375.54 samples/sec Loss 6.3369 LearningRate 0.0531 Epoch: 5 Global Step: 67400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:13,465-Speed 3359.65 samples/sec Loss 6.2254 LearningRate 0.0531 Epoch: 5 Global Step: 67410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:16,534-Speed 3337.03 samples/sec Loss 6.1326 LearningRate 0.0531 Epoch: 5 Global Step: 67420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:19,561-Speed 3385.10 samples/sec Loss 6.3254 LearningRate 0.0531 Epoch: 5 Global Step: 67430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:22,590-Speed 3381.20 samples/sec Loss 6.2561 LearningRate 0.0531 Epoch: 5 Global Step: 67440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:25,674-Speed 3321.19 samples/sec Loss 6.0671 LearningRate 0.0531 Epoch: 5 Global Step: 67450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:28,690-Speed 3395.98 samples/sec Loss 6.1965 LearningRate 0.0531 Epoch: 5 Global Step: 67460 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:31,749-Speed 3348.67 samples/sec Loss 6.2369 LearningRate 0.0531 Epoch: 5 Global Step: 67470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:34,800-Speed 3357.41 samples/sec Loss 6.2801 LearningRate 0.0531 Epoch: 5 Global Step: 67480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:37,832-Speed 3378.59 samples/sec Loss 6.2600 LearningRate 0.0530 Epoch: 5 Global Step: 67490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:40,939-Speed 3297.11 samples/sec Loss 6.4558 LearningRate 0.0530 Epoch: 5 Global Step: 67500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:44,013-Speed 3331.78 samples/sec Loss 6.3555 LearningRate 0.0530 Epoch: 5 Global Step: 67510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:47,082-Speed 3338.34 samples/sec Loss 6.2067 LearningRate 0.0530 Epoch: 5 Global Step: 67520 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:50,128-Speed 3362.78 samples/sec Loss 6.2829 LearningRate 0.0530 Epoch: 5 Global Step: 67530 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:53,229-Speed 3302.65 samples/sec Loss 6.2158 LearningRate 0.0530 Epoch: 5 Global Step: 67540 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:56,265-Speed 3374.36 samples/sec Loss 6.2769 LearningRate 0.0530 Epoch: 5 Global Step: 67550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:32:59,325-Speed 3347.09 samples/sec Loss 6.3115 LearningRate 0.0530 Epoch: 5 Global Step: 67560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:02,388-Speed 3344.59 samples/sec Loss 6.2511 LearningRate 0.0530 Epoch: 5 Global Step: 67570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:05,458-Speed 3336.58 samples/sec Loss 6.3631 LearningRate 0.0530 Epoch: 5 Global Step: 67580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:08,478-Speed 3391.72 samples/sec Loss 6.2913 LearningRate 0.0530 Epoch: 5 Global Step: 67590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:11,562-Speed 3321.71 samples/sec Loss 6.3128 LearningRate 0.0530 Epoch: 5 Global Step: 67600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:14,600-Speed 3371.46 samples/sec Loss 6.3314 LearningRate 0.0530 Epoch: 5 Global Step: 67610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:17,646-Speed 3363.35 samples/sec Loss 6.2855 LearningRate 0.0530 Epoch: 5 Global Step: 67620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:20,726-Speed 3325.53 samples/sec Loss 6.2649 LearningRate 0.0530 Epoch: 5 Global Step: 67630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:23,745-Speed 3392.92 samples/sec Loss 6.1858 LearningRate 0.0530 Epoch: 5 Global Step: 67640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:26,804-Speed 3348.51 samples/sec Loss 6.1402 LearningRate 0.0530 Epoch: 5 Global Step: 67650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:29,907-Speed 3301.52 samples/sec Loss 6.3845 LearningRate 0.0529 Epoch: 5 Global Step: 67660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:32,941-Speed 3375.25 samples/sec Loss 6.3436 LearningRate 0.0529 Epoch: 5 Global Step: 67670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:33:36,085-Speed 3258.46 samples/sec Loss 6.2328 LearningRate 0.0529 Epoch: 5 Global Step: 67680 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:39,129-Speed 3365.41 samples/sec Loss 6.2902 LearningRate 0.0529 Epoch: 5 Global Step: 67690 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:33:42,208-Speed 3326.08 samples/sec Loss 6.2736 LearningRate 0.0529 Epoch: 5 Global Step: 67700 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:33:45,284-Speed 3329.90 samples/sec Loss 6.3418 LearningRate 0.0529 Epoch: 5 Global Step: 67710 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:33:48,359-Speed 3331.69 samples/sec Loss 6.3074 LearningRate 0.0529 Epoch: 5 Global Step: 67720 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:33:51,437-Speed 3328.19 samples/sec Loss 6.2744 LearningRate 0.0529 Epoch: 5 Global Step: 67730 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:33:54,476-Speed 3369.90 samples/sec Loss 6.3275 LearningRate 0.0529 Epoch: 5 Global Step: 67740 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:33:57,546-Speed 3337.08 samples/sec Loss 6.1965 LearningRate 0.0529 Epoch: 5 Global Step: 67750 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:34:00,716-Speed 3230.97 samples/sec Loss 6.3762 LearningRate 0.0529 Epoch: 5 Global Step: 67760 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:34:03,768-Speed 3356.92 samples/sec Loss 6.3030 LearningRate 0.0529 Epoch: 5 Global Step: 67770 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:34:06,785-Speed 3395.58 samples/sec Loss 6.2368 LearningRate 0.0529 Epoch: 5 Global Step: 67780 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:34:09,803-Speed 3392.80 samples/sec Loss 6.3434 LearningRate 0.0529 Epoch: 5 Global Step: 67790 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:34:12,850-Speed 3362.06 samples/sec Loss 6.2896 LearningRate 0.0529 Epoch: 5 Global Step: 67800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:15,937-Speed 3318.21 samples/sec Loss 6.3288 LearningRate 0.0529 Epoch: 5 Global Step: 67810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:19,007-Speed 3337.02 samples/sec Loss 6.3683 LearningRate 0.0529 Epoch: 5 Global Step: 67820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:22,022-Speed 3397.03 samples/sec Loss 6.1344 LearningRate 0.0528 Epoch: 5 Global Step: 67830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:25,172-Speed 3252.23 samples/sec Loss 6.2680 LearningRate 0.0528 Epoch: 5 Global Step: 67840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:28,333-Speed 3240.87 samples/sec Loss 6.2648 LearningRate 0.0528 Epoch: 5 Global Step: 67850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:31,474-Speed 3260.50 samples/sec Loss 6.3416 LearningRate 0.0528 Epoch: 5 Global Step: 67860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:34,504-Speed 3381.59 samples/sec Loss 6.3326 LearningRate 0.0528 Epoch: 5 Global Step: 67870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:37,632-Speed 3274.36 samples/sec Loss 6.3008 LearningRate 0.0528 Epoch: 5 Global Step: 67880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:40,706-Speed 3332.11 samples/sec Loss 6.2443 LearningRate 0.0528 Epoch: 5 Global Step: 67890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:43,758-Speed 3356.65 samples/sec Loss 6.3350 LearningRate 0.0528 Epoch: 5 Global Step: 67900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:34:46,824-Speed 3340.73 samples/sec Loss 6.1825 LearningRate 0.0528 Epoch: 5 Global Step: 67910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:34:49,820-Speed 3418.90 samples/sec Loss 6.3012 LearningRate 0.0528 Epoch: 5 Global Step: 67920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:52,852-Speed 3377.79 samples/sec Loss 6.3458 LearningRate 0.0528 Epoch: 5 Global Step: 67930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:55,966-Speed 3290.42 samples/sec Loss 6.3437 LearningRate 0.0528 Epoch: 5 Global Step: 67940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:34:59,097-Speed 3271.13 samples/sec Loss 6.3454 LearningRate 0.0528 Epoch: 5 Global Step: 67950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:02,158-Speed 3345.99 samples/sec Loss 6.1795 LearningRate 0.0528 Epoch: 5 Global Step: 67960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:05,312-Speed 3247.79 samples/sec Loss 6.3087 LearningRate 0.0528 Epoch: 5 Global Step: 67970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:08,372-Speed 3347.51 samples/sec Loss 6.3738 LearningRate 0.0528 Epoch: 5 Global Step: 67980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:11,483-Speed 3292.41 samples/sec Loss 6.2195 LearningRate 0.0528 Epoch: 5 Global Step: 67990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:14,518-Speed 3375.38 samples/sec Loss 6.2938 LearningRate 0.0527 Epoch: 5 Global Step: 68000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:17,540-Speed 3390.05 samples/sec Loss 6.2450 LearningRate 0.0527 Epoch: 5 Global Step: 68010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:20,572-Speed 3378.83 samples/sec Loss 6.1707 LearningRate 0.0527 Epoch: 5 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:35:23,609-Speed 3371.95 samples/sec Loss 6.2986 LearningRate 0.0527 Epoch: 5 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:35:26,689-Speed 3325.70 samples/sec Loss 6.2469 LearningRate 0.0527 Epoch: 5 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:35:29,736-Speed 3361.51 samples/sec Loss 6.3322 LearningRate 0.0527 Epoch: 5 Global Step: 68050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:32,755-Speed 3393.43 samples/sec Loss 6.3057 LearningRate 0.0527 Epoch: 5 Global Step: 68060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:35,796-Speed 3368.68 samples/sec Loss 6.3819 LearningRate 0.0527 Epoch: 5 Global Step: 68070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:38,858-Speed 3344.74 samples/sec Loss 6.2950 LearningRate 0.0527 Epoch: 5 Global Step: 68080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:41,899-Speed 3368.42 samples/sec Loss 6.2362 LearningRate 0.0527 Epoch: 5 Global Step: 68090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:44,964-Speed 3342.57 samples/sec Loss 6.2848 LearningRate 0.0527 Epoch: 5 Global Step: 68100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:48,103-Speed 3263.11 samples/sec Loss 6.3431 LearningRate 0.0527 Epoch: 5 Global Step: 68110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:51,159-Speed 3352.23 samples/sec Loss 6.2533 LearningRate 0.0527 Epoch: 5 Global Step: 68120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:54,238-Speed 3326.35 samples/sec Loss 6.2313 LearningRate 0.0527 Epoch: 5 Global Step: 68130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:35:57,299-Speed 3346.39 samples/sec Loss 6.2685 LearningRate 0.0527 Epoch: 5 Global Step: 68140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:36:00,391-Speed 3313.22 samples/sec Loss 6.3834 LearningRate 0.0527 Epoch: 5 Global Step: 68150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:36:03,484-Speed 3312.01 samples/sec Loss 6.1897 LearningRate 0.0527 Epoch: 5 Global Step: 68160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:36:06,591-Speed 3296.75 samples/sec Loss 6.3125 LearningRate 0.0526 Epoch: 5 Global Step: 68170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:36:09,637-Speed 3362.95 samples/sec Loss 6.2978 LearningRate 0.0526 Epoch: 5 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:36:12,697-Speed 3347.05 samples/sec Loss 6.3473 LearningRate 0.0526 Epoch: 5 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:36:15,767-Speed 3336.81 samples/sec Loss 6.2455 LearningRate 0.0526 Epoch: 5 Global Step: 68200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:36:18,857-Speed 3314.26 samples/sec Loss 6.3750 LearningRate 0.0526 Epoch: 5 Global Step: 68210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:36:21,867-Speed 3402.95 samples/sec Loss 6.3278 LearningRate 0.0526 Epoch: 5 Global Step: 68220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:36:24,939-Speed 3335.23 samples/sec Loss 6.2548 LearningRate 0.0526 Epoch: 5 Global Step: 68230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:36:27,974-Speed 3375.09 samples/sec Loss 6.3010 LearningRate 0.0526 Epoch: 5 Global Step: 68240 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:36:31,051-Speed 3328.90 samples/sec Loss 6.2917 LearningRate 0.0526 Epoch: 5 Global Step: 68250 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:36:34,173-Speed 3280.52 samples/sec Loss 6.3946 LearningRate 0.0526 Epoch: 5 Global Step: 68260 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:36:37,298-Speed 3278.47 samples/sec Loss 6.2216 LearningRate 0.0526 Epoch: 5 Global Step: 68270 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:36:40,384-Speed 3319.09 samples/sec Loss 6.1328 LearningRate 0.0526 Epoch: 5 Global Step: 68280 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:36:43,412-Speed 3382.51 samples/sec Loss 6.2071 LearningRate 0.0526 Epoch: 5 Global Step: 68290 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:36:46,469-Speed 3350.63 samples/sec Loss 6.3266 LearningRate 0.0526 Epoch: 5 Global Step: 68300 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:36:49,557-Speed 3317.31 samples/sec Loss 6.2358 LearningRate 0.0526 Epoch: 5 Global Step: 68310 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:36:52,708-Speed 3250.66 samples/sec Loss 6.3809 LearningRate 0.0526 Epoch: 5 Global Step: 68320 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:36:55,854-Speed 3256.33 samples/sec Loss 6.3366 LearningRate 0.0526 Epoch: 5 Global Step: 68330 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:36:58,897-Speed 3365.71 samples/sec Loss 6.2842 LearningRate 0.0525 Epoch: 5 Global Step: 68340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:37:01,954-Speed 3351.77 samples/sec Loss 6.3679 LearningRate 0.0525 Epoch: 5 Global Step: 68350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:37:05,012-Speed 3348.78 samples/sec Loss 6.3270 LearningRate 0.0525 Epoch: 5 Global Step: 68360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:37:08,038-Speed 3385.50 samples/sec Loss 6.2863 LearningRate 0.0525 Epoch: 5 Global Step: 68370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:37:11,077-Speed 3370.68 samples/sec Loss 6.4104 LearningRate 0.0525 Epoch: 5 Global Step: 68380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:37:14,107-Speed 3380.23 samples/sec Loss 6.2873 LearningRate 0.0525 Epoch: 5 Global Step: 68390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:37:17,171-Speed 3343.96 samples/sec Loss 6.2484 LearningRate 0.0525 Epoch: 5 Global Step: 68400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:37:20,218-Speed 3361.69 samples/sec Loss 6.3154 LearningRate 0.0525 Epoch: 5 Global Step: 68410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:37:23,255-Speed 3372.44 samples/sec Loss 6.3259 LearningRate 0.0525 Epoch: 5 Global Step: 68420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:37:26,293-Speed 3371.59 samples/sec Loss 6.3056 LearningRate 0.0525 Epoch: 5 Global Step: 68430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:37:29,423-Speed 3273.40 samples/sec Loss 6.2728 LearningRate 0.0525 Epoch: 5 Global Step: 68440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:37:32,458-Speed 3375.12 samples/sec Loss 6.2544 LearningRate 0.0525 Epoch: 5 Global Step: 68450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:37:35,535-Speed 3328.04 samples/sec Loss 6.2905 LearningRate 0.0525 Epoch: 5 Global Step: 68460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:37:38,588-Speed 3356.22 samples/sec Loss 6.3335 LearningRate 0.0525 Epoch: 5 Global Step: 68470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:37:41,672-Speed 3321.46 samples/sec Loss 6.2834 LearningRate 0.0525 Epoch: 5 Global Step: 68480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:37:44,756-Speed 3320.96 samples/sec Loss 6.2068 LearningRate 0.0525 Epoch: 5 Global Step: 68490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:37:47,821-Speed 3342.22 samples/sec Loss 6.2148 LearningRate 0.0525 Epoch: 5 Global Step: 68500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:37:50,927-Speed 3297.05 samples/sec Loss 6.3437 LearningRate 0.0524 Epoch: 5 Global Step: 68510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:37:53,967-Speed 3369.51 samples/sec Loss 6.2919 LearningRate 0.0524 Epoch: 5 Global Step: 68520 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:37:57,051-Speed 3321.86 samples/sec Loss 6.3376 LearningRate 0.0524 Epoch: 5 Global Step: 68530 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:38:00,123-Speed 3334.13 samples/sec Loss 6.2070 LearningRate 0.0524 Epoch: 5 Global Step: 68540 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:38:03,203-Speed 3326.42 samples/sec Loss 6.2589 LearningRate 0.0524 Epoch: 5 Global Step: 68550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:38:06,299-Speed 3308.32 samples/sec Loss 6.2798 LearningRate 0.0524 Epoch: 5 Global Step: 68560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:38:09,345-Speed 3363.08 samples/sec Loss 6.2798 LearningRate 0.0524 Epoch: 5 Global Step: 68570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:38:12,383-Speed 3371.71 samples/sec Loss 6.2776 LearningRate 0.0524 Epoch: 5 Global Step: 68580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:38:15,436-Speed 3354.86 samples/sec Loss 6.3598 LearningRate 0.0524 Epoch: 5 Global Step: 68590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:38:18,469-Speed 3377.57 samples/sec Loss 6.3914 LearningRate 0.0524 Epoch: 5 Global Step: 68600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:38:21,517-Speed 3360.23 samples/sec Loss 6.2179 LearningRate 0.0524 Epoch: 5 Global Step: 68610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:38:24,631-Speed 3289.48 samples/sec Loss 6.2988 LearningRate 0.0524 Epoch: 5 Global Step: 68620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:38:27,699-Speed 3339.08 samples/sec Loss 6.3582 LearningRate 0.0524 Epoch: 5 Global Step: 68630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:38:30,871-Speed 3229.29 samples/sec Loss 6.2581 LearningRate 0.0524 Epoch: 5 Global Step: 68640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:38:33,911-Speed 3369.53 samples/sec Loss 6.3058 LearningRate 0.0524 Epoch: 5 Global Step: 68650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:38:36,992-Speed 3323.94 samples/sec Loss 6.2593 LearningRate 0.0524 Epoch: 5 Global Step: 68660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:38:40,131-Speed 3263.45 samples/sec Loss 6.3556 LearningRate 0.0524 Epoch: 5 Global Step: 68670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:38:43,182-Speed 3357.71 samples/sec Loss 6.4178 LearningRate 0.0523 Epoch: 5 Global Step: 68680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:38:46,213-Speed 3378.95 samples/sec Loss 6.3205 LearningRate 0.0523 Epoch: 5 Global Step: 68690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:38:49,259-Speed 3362.85 samples/sec Loss 6.1861 LearningRate 0.0523 Epoch: 5 Global Step: 68700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:38:52,364-Speed 3299.33 samples/sec Loss 6.3236 LearningRate 0.0523 Epoch: 5 Global Step: 68710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:38:55,416-Speed 3356.86 samples/sec Loss 6.4031 LearningRate 0.0523 Epoch: 5 Global Step: 68720 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 07:38:58,453-Speed 3372.69 samples/sec Loss 6.2725 LearningRate 0.0523 Epoch: 5 Global Step: 68730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:39:01,498-Speed 3363.85 samples/sec Loss 6.5169 LearningRate 0.0523 Epoch: 5 Global Step: 68740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:39:04,600-Speed 3302.30 samples/sec Loss 6.3362 LearningRate 0.0523 Epoch: 5 Global Step: 68750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:39:07,678-Speed 3326.89 samples/sec Loss 6.3885 LearningRate 0.0523 Epoch: 5 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:39:10,725-Speed 3362.50 samples/sec Loss 6.3326 LearningRate 0.0523 Epoch: 5 Global Step: 68770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:39:13,803-Speed 3327.27 samples/sec Loss 6.2559 LearningRate 0.0523 Epoch: 5 Global Step: 68780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:39:16,888-Speed 3320.25 samples/sec Loss 6.3035 LearningRate 0.0523 Epoch: 5 Global Step: 68790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:39:19,938-Speed 3361.58 samples/sec Loss 6.2501 LearningRate 0.0523 Epoch: 5 Global Step: 68800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:39:23,004-Speed 3340.05 samples/sec Loss 6.3038 LearningRate 0.0523 Epoch: 5 Global Step: 68810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:39:26,135-Speed 3272.36 samples/sec Loss 6.3965 LearningRate 0.0523 Epoch: 5 Global Step: 68820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:39:29,178-Speed 3366.31 samples/sec Loss 6.3206 LearningRate 0.0523 Epoch: 5 Global Step: 68830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:39:32,232-Speed 3353.00 samples/sec Loss 6.3301 LearningRate 0.0523 Epoch: 5 Global Step: 68840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:39:35,241-Speed 3404.20 samples/sec Loss 6.2021 LearningRate 0.0523 Epoch: 5 Global Step: 68850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:39:38,302-Speed 3347.00 samples/sec Loss 6.3294 LearningRate 0.0522 Epoch: 5 Global Step: 68860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:39:41,439-Speed 3264.99 samples/sec Loss 6.3260 LearningRate 0.0522 Epoch: 5 Global Step: 68870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:39:44,488-Speed 3359.89 samples/sec Loss 6.2450 LearningRate 0.0522 Epoch: 5 Global Step: 68880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:39:47,537-Speed 3359.34 samples/sec Loss 6.3503 LearningRate 0.0522 Epoch: 5 Global Step: 68890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:39:50,586-Speed 3360.21 samples/sec Loss 6.2959 LearningRate 0.0522 Epoch: 5 Global Step: 68900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:39:53,634-Speed 3360.98 samples/sec Loss 6.3167 LearningRate 0.0522 Epoch: 5 Global Step: 68910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:39:56,722-Speed 3317.01 samples/sec Loss 6.3189 LearningRate 0.0522 Epoch: 5 Global Step: 68920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:39:59,784-Speed 3344.88 samples/sec Loss 6.4135 LearningRate 0.0522 Epoch: 5 Global Step: 68930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:40:02,961-Speed 3224.18 samples/sec Loss 6.2942 LearningRate 0.0522 Epoch: 5 Global Step: 68940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:40:06,026-Speed 3342.43 samples/sec Loss 6.2942 LearningRate 0.0522 Epoch: 5 Global Step: 68950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:40:09,108-Speed 3322.64 samples/sec Loss 6.4667 LearningRate 0.0522 Epoch: 5 Global Step: 68960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:40:12,120-Speed 3402.02 samples/sec Loss 6.3262 LearningRate 0.0522 Epoch: 5 Global Step: 68970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:15,147-Speed 3383.76 samples/sec Loss 6.2310 LearningRate 0.0522 Epoch: 5 Global Step: 68980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:18,174-Speed 3384.15 samples/sec Loss 6.2779 LearningRate 0.0522 Epoch: 5 Global Step: 68990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:21,230-Speed 3351.61 samples/sec Loss 6.3094 LearningRate 0.0522 Epoch: 5 Global Step: 69000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:24,315-Speed 3319.68 samples/sec Loss 6.4098 LearningRate 0.0522 Epoch: 5 Global Step: 69010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:27,367-Speed 3357.13 samples/sec Loss 6.1594 LearningRate 0.0522 Epoch: 5 Global Step: 69020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:30,486-Speed 3284.22 samples/sec Loss 6.2844 LearningRate 0.0521 Epoch: 5 Global Step: 69030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:33,549-Speed 3344.07 samples/sec Loss 6.2574 LearningRate 0.0521 Epoch: 5 Global Step: 69040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:36,572-Speed 3388.08 samples/sec Loss 6.3226 LearningRate 0.0521 Epoch: 5 Global Step: 69050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:39,575-Speed 3411.26 samples/sec Loss 6.2748 LearningRate 0.0521 Epoch: 5 Global Step: 69060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:42,620-Speed 3363.34 samples/sec Loss 6.4019 LearningRate 0.0521 Epoch: 5 Global Step: 69070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:45,692-Speed 3334.80 samples/sec Loss 6.3416 LearningRate 0.0521 Epoch: 5 Global Step: 69080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:48,733-Speed 3368.49 samples/sec Loss 6.2067 LearningRate 0.0521 Epoch: 5 Global Step: 69090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:51,803-Speed 3337.05 samples/sec Loss 6.3413 LearningRate 0.0521 Epoch: 5 Global Step: 69100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:54,861-Speed 3349.14 samples/sec Loss 6.3745 LearningRate 0.0521 Epoch: 5 Global Step: 69110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:40:57,931-Speed 3336.51 samples/sec Loss 6.3439 LearningRate 0.0521 Epoch: 5 Global Step: 69120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:00,989-Speed 3349.65 samples/sec Loss 6.3390 LearningRate 0.0521 Epoch: 5 Global Step: 69130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:04,081-Speed 3312.88 samples/sec Loss 6.3247 LearningRate 0.0521 Epoch: 5 Global Step: 69140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:07,135-Speed 3354.21 samples/sec Loss 6.2149 LearningRate 0.0521 Epoch: 5 Global Step: 69150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:10,187-Speed 3356.50 samples/sec Loss 6.4440 LearningRate 0.0521 Epoch: 5 Global Step: 69160 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:13,300-Speed 3290.44 samples/sec Loss 6.3084 LearningRate 0.0521 Epoch: 5 Global Step: 69170 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:16,336-Speed 3373.32 samples/sec Loss 6.2847 LearningRate 0.0521 Epoch: 5 Global Step: 69180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:19,417-Speed 3324.77 samples/sec Loss 6.3124 LearningRate 0.0521 Epoch: 5 Global Step: 69190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:22,487-Speed 3337.23 samples/sec Loss 6.2755 LearningRate 0.0520 Epoch: 5 Global Step: 69200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:25,569-Speed 3323.69 samples/sec Loss 6.3368 LearningRate 0.0520 Epoch: 5 Global Step: 69210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:28,648-Speed 3326.85 samples/sec Loss 6.3233 LearningRate 0.0520 Epoch: 5 Global Step: 69220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:31,720-Speed 3334.36 samples/sec Loss 6.2437 LearningRate 0.0520 Epoch: 5 Global Step: 69230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:34,750-Speed 3380.35 samples/sec Loss 6.2810 LearningRate 0.0520 Epoch: 5 Global Step: 69240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:37,849-Speed 3305.52 samples/sec Loss 6.3636 LearningRate 0.0520 Epoch: 5 Global Step: 69250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:41:40,897-Speed 3360.85 samples/sec Loss 6.4450 LearningRate 0.0520 Epoch: 5 Global Step: 69260 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:41:44,018-Speed 3282.39 samples/sec Loss 6.3241 LearningRate 0.0520 Epoch: 5 Global Step: 69270 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:41:47,078-Speed 3346.68 samples/sec Loss 6.3676 LearningRate 0.0520 Epoch: 5 Global Step: 69280 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:41:50,242-Speed 3237.60 samples/sec Loss 6.4139 LearningRate 0.0520 Epoch: 5 Global Step: 69290 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:41:53,339-Speed 3307.85 samples/sec Loss 6.1563 LearningRate 0.0520 Epoch: 5 Global Step: 69300 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:41:56,382-Speed 3365.88 samples/sec Loss 6.3144 LearningRate 0.0520 Epoch: 5 Global Step: 69310 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:41:59,448-Speed 3340.46 samples/sec Loss 6.4220 LearningRate 0.0520 Epoch: 5 Global Step: 69320 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:42:02,498-Speed 3359.40 samples/sec Loss 6.2776 LearningRate 0.0520 Epoch: 5 Global Step: 69330 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:42:05,561-Speed 3344.58 samples/sec Loss 6.4169 LearningRate 0.0520 Epoch: 5 Global Step: 69340 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:42:08,645-Speed 3320.42 samples/sec Loss 6.2718 LearningRate 0.0520 Epoch: 5 Global Step: 69350 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:42:11,704-Speed 3348.69 samples/sec Loss 6.3861 LearningRate 0.0520 Epoch: 5 Global Step: 69360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:14,771-Speed 3340.03 samples/sec Loss 6.2846 LearningRate 0.0519 Epoch: 5 Global Step: 69370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:17,868-Speed 3307.80 samples/sec Loss 6.3808 LearningRate 0.0519 Epoch: 5 Global Step: 69380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:20,911-Speed 3366.22 samples/sec Loss 6.3791 LearningRate 0.0519 Epoch: 5 Global Step: 69390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:24,038-Speed 3276.77 samples/sec Loss 6.2794 LearningRate 0.0519 Epoch: 5 Global Step: 69400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:27,114-Speed 3329.42 samples/sec Loss 6.3033 LearningRate 0.0519 Epoch: 5 Global Step: 69410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:30,200-Speed 3318.98 samples/sec Loss 6.3849 LearningRate 0.0519 Epoch: 5 Global Step: 69420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:33,272-Speed 3334.88 samples/sec Loss 6.3276 LearningRate 0.0519 Epoch: 5 Global Step: 69430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:36,406-Speed 3267.61 samples/sec Loss 6.5255 LearningRate 0.0519 Epoch: 5 Global Step: 69440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:39,510-Speed 3299.98 samples/sec Loss 6.3166 LearningRate 0.0519 Epoch: 5 Global Step: 69450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:42,554-Speed 3365.31 samples/sec Loss 6.3011 LearningRate 0.0519 Epoch: 5 Global Step: 69460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:42:45,585-Speed 3379.59 samples/sec Loss 6.4978 LearningRate 0.0519 Epoch: 5 Global Step: 69470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:48,627-Speed 3366.68 samples/sec Loss 6.2702 LearningRate 0.0519 Epoch: 5 Global Step: 69480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:51,688-Speed 3347.69 samples/sec Loss 6.3011 LearningRate 0.0519 Epoch: 5 Global Step: 69490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:54,733-Speed 3363.87 samples/sec Loss 6.2885 LearningRate 0.0519 Epoch: 5 Global Step: 69500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:42:57,786-Speed 3354.56 samples/sec Loss 6.3984 LearningRate 0.0519 Epoch: 5 Global Step: 69510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:00,843-Speed 3350.83 samples/sec Loss 6.3876 LearningRate 0.0519 Epoch: 5 Global Step: 69520 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:03,942-Speed 3305.11 samples/sec Loss 6.3491 LearningRate 0.0519 Epoch: 5 Global Step: 69530 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:07,023-Speed 3324.94 samples/sec Loss 6.3218 LearningRate 0.0519 Epoch: 5 Global Step: 69540 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:10,066-Speed 3366.22 samples/sec Loss 6.2938 LearningRate 0.0518 Epoch: 5 Global Step: 69550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:13,170-Speed 3299.52 samples/sec Loss 6.3237 LearningRate 0.0518 Epoch: 5 Global Step: 69560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:16,212-Speed 3368.35 samples/sec Loss 6.3219 LearningRate 0.0518 Epoch: 5 Global Step: 69570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:19,285-Speed 3332.88 samples/sec Loss 6.2904 LearningRate 0.0518 Epoch: 5 Global Step: 69580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:22,374-Speed 3315.57 samples/sec Loss 6.3924 LearningRate 0.0518 Epoch: 5 Global Step: 69590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:25,475-Speed 3303.48 samples/sec Loss 6.2856 LearningRate 0.0518 Epoch: 5 Global Step: 69600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:28,611-Speed 3265.99 samples/sec Loss 6.4063 LearningRate 0.0518 Epoch: 5 Global Step: 69610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:31,727-Speed 3287.05 samples/sec Loss 6.3475 LearningRate 0.0518 Epoch: 5 Global Step: 69620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:34,788-Speed 3346.76 samples/sec Loss 6.3129 LearningRate 0.0518 Epoch: 5 Global Step: 69630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:37,926-Speed 3264.68 samples/sec Loss 6.3970 LearningRate 0.0518 Epoch: 5 Global Step: 69640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:41,006-Speed 3325.14 samples/sec Loss 6.4451 LearningRate 0.0518 Epoch: 5 Global Step: 69650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:44,062-Speed 3352.36 samples/sec Loss 6.3753 LearningRate 0.0518 Epoch: 5 Global Step: 69660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:43:47,154-Speed 3312.92 samples/sec Loss 6.2767 LearningRate 0.0518 Epoch: 5 Global Step: 69670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:43:50,227-Speed 3333.45 samples/sec Loss 6.3311 LearningRate 0.0518 Epoch: 5 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:43:53,327-Speed 3303.89 samples/sec Loss 6.2973 LearningRate 0.0518 Epoch: 5 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:43:56,381-Speed 3354.66 samples/sec Loss 6.3205 LearningRate 0.0518 Epoch: 5 Global Step: 69700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:43:59,442-Speed 3346.03 samples/sec Loss 6.3863 LearningRate 0.0518 Epoch: 5 Global Step: 69710 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:44:02,535-Speed 3312.16 samples/sec Loss 6.4187 LearningRate 0.0517 Epoch: 5 Global Step: 69720 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:44:05,619-Speed 3321.12 samples/sec Loss 6.3524 LearningRate 0.0517 Epoch: 5 Global Step: 69730 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:44:08,660-Speed 3368.24 samples/sec Loss 6.3589 LearningRate 0.0517 Epoch: 5 Global Step: 69740 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:44:11,680-Speed 3392.30 samples/sec Loss 6.3395 LearningRate 0.0517 Epoch: 5 Global Step: 69750 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:44:14,743-Speed 3344.47 samples/sec Loss 6.2957 LearningRate 0.0517 Epoch: 5 Global Step: 69760 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:44:17,804-Speed 3346.07 samples/sec Loss 6.2759 LearningRate 0.0517 Epoch: 5 Global Step: 69770 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:44:20,872-Speed 3338.22 samples/sec Loss 6.3579 LearningRate 0.0517 Epoch: 5 Global Step: 69780 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:44:23,981-Speed 3294.65 samples/sec Loss 6.2773 LearningRate 0.0517 Epoch: 5 Global Step: 69790 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:44:27,095-Speed 3289.59 samples/sec Loss 6.3691 LearningRate 0.0517 Epoch: 5 Global Step: 69800 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:44:30,229-Speed 3268.73 samples/sec Loss 6.2563 LearningRate 0.0517 Epoch: 5 Global Step: 69810 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:44:33,306-Speed 3328.35 samples/sec Loss 6.2505 LearningRate 0.0517 Epoch: 5 Global Step: 69820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:44:36,432-Speed 3277.36 samples/sec Loss 6.3322 LearningRate 0.0517 Epoch: 5 Global Step: 69830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:44:39,484-Speed 3356.49 samples/sec Loss 6.2635 LearningRate 0.0517 Epoch: 5 Global Step: 69840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:44:42,634-Speed 3251.51 samples/sec Loss 6.3292 LearningRate 0.0517 Epoch: 5 Global Step: 69850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:44:45,668-Speed 3375.74 samples/sec Loss 6.2216 LearningRate 0.0517 Epoch: 5 Global Step: 69860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:44:48,726-Speed 3349.99 samples/sec Loss 6.4179 LearningRate 0.0517 Epoch: 5 Global Step: 69870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:44:51,843-Speed 3287.95 samples/sec Loss 6.3594 LearningRate 0.0517 Epoch: 5 Global Step: 69880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:44:54,973-Speed 3273.00 samples/sec Loss 6.3542 LearningRate 0.0516 Epoch: 5 Global Step: 69890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:44:58,033-Speed 3347.70 samples/sec Loss 6.4765 LearningRate 0.0516 Epoch: 5 Global Step: 69900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:45:01,137-Speed 3299.81 samples/sec Loss 6.3775 LearningRate 0.0516 Epoch: 5 Global Step: 69910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:45:04,274-Speed 3265.46 samples/sec Loss 6.3864 LearningRate 0.0516 Epoch: 5 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:45:07,357-Speed 3322.36 samples/sec Loss 6.3015 LearningRate 0.0516 Epoch: 5 Global Step: 69930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:45:10,418-Speed 3346.36 samples/sec Loss 6.3043 LearningRate 0.0516 Epoch: 5 Global Step: 69940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:45:13,523-Speed 3298.51 samples/sec Loss 6.2929 LearningRate 0.0516 Epoch: 5 Global Step: 69950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:45:16,627-Speed 3300.30 samples/sec Loss 6.3483 LearningRate 0.0516 Epoch: 5 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:45:19,710-Speed 3322.99 samples/sec Loss 6.4233 LearningRate 0.0516 Epoch: 5 Global Step: 69970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:45:22,786-Speed 3330.32 samples/sec Loss 6.2937 LearningRate 0.0516 Epoch: 5 Global Step: 69980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:45:25,873-Speed 3317.83 samples/sec Loss 6.1548 LearningRate 0.0516 Epoch: 5 Global Step: 69990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:45:28,948-Speed 3330.82 samples/sec Loss 6.2502 LearningRate 0.0516 Epoch: 5 Global Step: 70000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:45:32,012-Speed 3343.75 samples/sec Loss 6.2923 LearningRate 0.0516 Epoch: 5 Global Step: 70010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:45:35,105-Speed 3311.63 samples/sec Loss 6.3991 LearningRate 0.0516 Epoch: 5 Global Step: 70020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 07:45:38,181-Speed 3330.22 samples/sec Loss 6.2314 LearningRate 0.0516 Epoch: 5 Global Step: 70030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:45:41,246-Speed 3341.41 samples/sec Loss 6.3203 LearningRate 0.0516 Epoch: 5 Global Step: 70040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:45:44,310-Speed 3343.46 samples/sec Loss 6.3349 LearningRate 0.0516 Epoch: 5 Global Step: 70050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:45:47,351-Speed 3367.46 samples/sec Loss 6.3800 LearningRate 0.0515 Epoch: 5 Global Step: 70060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:45:50,385-Speed 3377.17 samples/sec Loss 6.3148 LearningRate 0.0515 Epoch: 5 Global Step: 70070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:45:53,470-Speed 3320.08 samples/sec Loss 6.3022 LearningRate 0.0515 Epoch: 5 Global Step: 70080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:45:56,520-Speed 3358.43 samples/sec Loss 6.3069 LearningRate 0.0515 Epoch: 5 Global Step: 70090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:45:59,563-Speed 3366.00 samples/sec Loss 6.2943 LearningRate 0.0515 Epoch: 5 Global Step: 70100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:02,647-Speed 3321.68 samples/sec Loss 6.4039 LearningRate 0.0515 Epoch: 5 Global Step: 70110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:05,713-Speed 3341.15 samples/sec Loss 6.3267 LearningRate 0.0515 Epoch: 5 Global Step: 70120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:08,726-Speed 3399.28 samples/sec Loss 6.2384 LearningRate 0.0515 Epoch: 5 Global Step: 70130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:11,802-Speed 3329.24 samples/sec Loss 6.3296 LearningRate 0.0515 Epoch: 5 Global Step: 70140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:46:14,941-Speed 3263.77 samples/sec Loss 6.3765 LearningRate 0.0515 Epoch: 5 Global Step: 70150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:18,020-Speed 3326.24 samples/sec Loss 6.2960 LearningRate 0.0515 Epoch: 5 Global Step: 70160 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:21,069-Speed 3360.09 samples/sec Loss 6.3980 LearningRate 0.0515 Epoch: 5 Global Step: 70170 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:24,164-Speed 3308.95 samples/sec Loss 6.3156 LearningRate 0.0515 Epoch: 5 Global Step: 70180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:27,241-Speed 3329.63 samples/sec Loss 6.2820 LearningRate 0.0515 Epoch: 5 Global Step: 70190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:30,371-Speed 3272.42 samples/sec Loss 6.2953 LearningRate 0.0515 Epoch: 5 Global Step: 70200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:33,509-Speed 3264.92 samples/sec Loss 6.2845 LearningRate 0.0515 Epoch: 5 Global Step: 70210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:36,606-Speed 3306.54 samples/sec Loss 6.3419 LearningRate 0.0515 Epoch: 5 Global Step: 70220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:39,721-Speed 3288.84 samples/sec Loss 6.3176 LearningRate 0.0515 Epoch: 5 Global Step: 70230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:42,813-Speed 3312.82 samples/sec Loss 6.2987 LearningRate 0.0514 Epoch: 5 Global Step: 70240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:45,849-Speed 3374.21 samples/sec Loss 6.2770 LearningRate 0.0514 Epoch: 5 Global Step: 70250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:49,066-Speed 3184.07 samples/sec Loss 6.4098 LearningRate 0.0514 Epoch: 5 Global Step: 70260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:52,173-Speed 3296.30 samples/sec Loss 6.4091 LearningRate 0.0514 Epoch: 5 Global Step: 70270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:55,266-Speed 3311.91 samples/sec Loss 6.4047 LearningRate 0.0514 Epoch: 5 Global Step: 70280 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:46:58,323-Speed 3351.64 samples/sec Loss 6.3044 LearningRate 0.0514 Epoch: 5 Global Step: 70290 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:47:01,445-Speed 3280.24 samples/sec Loss 6.3003 LearningRate 0.0514 Epoch: 5 Global Step: 70300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:47:04,518-Speed 3333.29 samples/sec Loss 6.2659 LearningRate 0.0514 Epoch: 5 Global Step: 70310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:47:07,598-Speed 3325.47 samples/sec Loss 6.2577 LearningRate 0.0514 Epoch: 5 Global Step: 70320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:47:10,644-Speed 3362.88 samples/sec Loss 6.3115 LearningRate 0.0514 Epoch: 5 Global Step: 70330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:47:13,749-Speed 3299.76 samples/sec Loss 6.3903 LearningRate 0.0514 Epoch: 5 Global Step: 70340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:47:16,857-Speed 3295.77 samples/sec Loss 6.2195 LearningRate 0.0514 Epoch: 5 Global Step: 70350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:47:19,937-Speed 3326.03 samples/sec Loss 6.3872 LearningRate 0.0514 Epoch: 5 Global Step: 70360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:47:23,034-Speed 3307.58 samples/sec Loss 6.4043 LearningRate 0.0514 Epoch: 5 Global Step: 70370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:47:26,136-Speed 3302.23 samples/sec Loss 6.3928 LearningRate 0.0514 Epoch: 5 Global Step: 70380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:47:29,217-Speed 3324.06 samples/sec Loss 6.3090 LearningRate 0.0514 Epoch: 5 Global Step: 70390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:47:32,314-Speed 3307.18 samples/sec Loss 6.3580 LearningRate 0.0514 Epoch: 5 Global Step: 70400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:47:35,449-Speed 3267.85 samples/sec Loss 6.3956 LearningRate 0.0513 Epoch: 5 Global Step: 70410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:47:38,548-Speed 3305.50 samples/sec Loss 6.3370 LearningRate 0.0513 Epoch: 5 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:47:41,650-Speed 3301.23 samples/sec Loss 6.3524 LearningRate 0.0513 Epoch: 5 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:47:44,734-Speed 3322.04 samples/sec Loss 6.4170 LearningRate 0.0513 Epoch: 5 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:47:47,774-Speed 3369.69 samples/sec Loss 6.3354 LearningRate 0.0513 Epoch: 5 Global Step: 70450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 07:47:50,837-Speed 3343.64 samples/sec Loss 6.3701 LearningRate 0.0513 Epoch: 5 Global Step: 70460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 07:47:53,930-Speed 3312.19 samples/sec Loss 6.4556 LearningRate 0.0513 Epoch: 5 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:47:56,932-Speed 3412.37 samples/sec Loss 6.3080 LearningRate 0.0513 Epoch: 5 Global Step: 70480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:48:00,095-Speed 3238.53 samples/sec Loss 6.3042 LearningRate 0.0513 Epoch: 5 Global Step: 70490 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:48:03,203-Speed 3295.54 samples/sec Loss 6.3738 LearningRate 0.0513 Epoch: 5 Global Step: 70500 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:48:06,311-Speed 3296.07 samples/sec Loss 6.2716 LearningRate 0.0513 Epoch: 5 Global Step: 70510 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:48:09,349-Speed 3371.19 samples/sec Loss 6.3380 LearningRate 0.0513 Epoch: 5 Global Step: 70520 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:48:12,398-Speed 3360.03 samples/sec Loss 6.3171 LearningRate 0.0513 Epoch: 5 Global Step: 70530 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:48:15,498-Speed 3305.56 samples/sec Loss 6.3537 LearningRate 0.0513 Epoch: 5 Global Step: 70540 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:48:18,652-Speed 3246.70 samples/sec Loss 6.3712 LearningRate 0.0513 Epoch: 5 Global Step: 70550 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:48:21,772-Speed 3283.44 samples/sec Loss 6.3740 LearningRate 0.0513 Epoch: 5 Global Step: 70560 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:48:24,951-Speed 3221.68 samples/sec Loss 6.3141 LearningRate 0.0513 Epoch: 5 Global Step: 70570 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:48:28,137-Speed 3215.71 samples/sec Loss 6.3325 LearningRate 0.0512 Epoch: 5 Global Step: 70580 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:48:31,247-Speed 3292.60 samples/sec Loss 6.3523 LearningRate 0.0512 Epoch: 5 Global Step: 70590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:48:34,314-Speed 3340.43 samples/sec Loss 6.3207 LearningRate 0.0512 Epoch: 5 Global Step: 70600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:48:37,355-Speed 3368.20 samples/sec Loss 6.2001 LearningRate 0.0512 Epoch: 5 Global Step: 70610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:48:40,422-Speed 3340.21 samples/sec Loss 6.3192 LearningRate 0.0512 Epoch: 5 Global Step: 70620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:48:43,459-Speed 3373.29 samples/sec Loss 6.3167 LearningRate 0.0512 Epoch: 5 Global Step: 70630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:48:46,542-Speed 3321.70 samples/sec Loss 6.3231 LearningRate 0.0512 Epoch: 5 Global Step: 70640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:48:49,603-Speed 3346.45 samples/sec Loss 6.4258 LearningRate 0.0512 Epoch: 5 Global Step: 70650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:48:52,711-Speed 3296.60 samples/sec Loss 6.1784 LearningRate 0.0512 Epoch: 5 Global Step: 70660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:48:55,760-Speed 3359.37 samples/sec Loss 6.2886 LearningRate 0.0512 Epoch: 5 Global Step: 70670 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:48:58,814-Speed 3353.78 samples/sec Loss 6.3862 LearningRate 0.0512 Epoch: 5 Global Step: 70680 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:49:01,900-Speed 3319.34 samples/sec Loss 6.2785 LearningRate 0.0512 Epoch: 5 Global Step: 70690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:49:04,992-Speed 3312.98 samples/sec Loss 6.2264 LearningRate 0.0512 Epoch: 5 Global Step: 70700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:49:08,064-Speed 3334.44 samples/sec Loss 6.3462 LearningRate 0.0512 Epoch: 5 Global Step: 70710 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:49:11,145-Speed 3324.36 samples/sec Loss 6.3920 LearningRate 0.0512 Epoch: 5 Global Step: 70720 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:49:14,201-Speed 3352.37 samples/sec Loss 6.3978 LearningRate 0.0512 Epoch: 5 Global Step: 70730 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:49:17,259-Speed 3348.92 samples/sec Loss 6.3390 LearningRate 0.0512 Epoch: 5 Global Step: 70740 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:49:20,341-Speed 3324.30 samples/sec Loss 6.2726 LearningRate 0.0512 Epoch: 5 Global Step: 70750 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:49:23,416-Speed 3331.05 samples/sec Loss 6.2496 LearningRate 0.0511 Epoch: 5 Global Step: 70760 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:49:26,474-Speed 3350.25 samples/sec Loss 6.3607 LearningRate 0.0511 Epoch: 5 Global Step: 70770 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:49:29,534-Speed 3346.77 samples/sec Loss 6.3295 LearningRate 0.0511 Epoch: 5 Global Step: 70780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:49:32,590-Speed 3352.04 samples/sec Loss 6.2617 LearningRate 0.0511 Epoch: 5 Global Step: 70790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:49:35,613-Speed 3388.62 samples/sec Loss 6.2282 LearningRate 0.0511 Epoch: 5 Global Step: 70800 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:49:38,719-Speed 3298.12 samples/sec Loss 6.3500 LearningRate 0.0511 Epoch: 5 Global Step: 70810 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:49:41,816-Speed 3306.78 samples/sec Loss 6.2885 LearningRate 0.0511 Epoch: 5 Global Step: 70820 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:49:44,865-Speed 3360.16 samples/sec Loss 6.3708 LearningRate 0.0511 Epoch: 5 Global Step: 70830 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:49:47,917-Speed 3356.07 samples/sec Loss 6.3294 LearningRate 0.0511 Epoch: 5 Global Step: 70840 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:49:51,043-Speed 3276.64 samples/sec Loss 6.3596 LearningRate 0.0511 Epoch: 5 Global Step: 70850 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:49:54,133-Speed 3315.16 samples/sec Loss 6.3393 LearningRate 0.0511 Epoch: 5 Global Step: 70860 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:49:57,165-Speed 3378.17 samples/sec Loss 6.3263 LearningRate 0.0511 Epoch: 5 Global Step: 70870 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:00,222-Speed 3350.78 samples/sec Loss 6.3204 LearningRate 0.0511 Epoch: 5 Global Step: 70880 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:03,329-Speed 3297.01 samples/sec Loss 6.2463 LearningRate 0.0511 Epoch: 5 Global Step: 70890 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:06,400-Speed 3335.13 samples/sec Loss 6.3844 LearningRate 0.0511 Epoch: 5 Global Step: 70900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:50:09,461-Speed 3346.78 samples/sec Loss 6.3194 LearningRate 0.0511 Epoch: 5 Global Step: 70910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:50:12,555-Speed 3310.80 samples/sec Loss 6.3162 LearningRate 0.0511 Epoch: 5 Global Step: 70920 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:15,675-Speed 3283.21 samples/sec Loss 6.1863 LearningRate 0.0510 Epoch: 5 Global Step: 70930 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:18,735-Speed 3346.84 samples/sec Loss 6.2803 LearningRate 0.0510 Epoch: 5 Global Step: 70940 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:21,818-Speed 3322.45 samples/sec Loss 6.4293 LearningRate 0.0510 Epoch: 5 Global Step: 70950 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:24,962-Speed 3258.62 samples/sec Loss 6.3399 LearningRate 0.0510 Epoch: 5 Global Step: 70960 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:28,121-Speed 3241.97 samples/sec Loss 6.3373 LearningRate 0.0510 Epoch: 5 Global Step: 70970 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:31,248-Speed 3275.56 samples/sec Loss 6.4230 LearningRate 0.0510 Epoch: 5 Global Step: 70980 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:34,301-Speed 3355.13 samples/sec Loss 6.3316 LearningRate 0.0510 Epoch: 5 Global Step: 70990 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:37,351-Speed 3358.45 samples/sec Loss 6.2592 LearningRate 0.0510 Epoch: 5 Global Step: 71000 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:40,448-Speed 3307.29 samples/sec Loss 6.3283 LearningRate 0.0510 Epoch: 5 Global Step: 71010 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:50:43,514-Speed 3341.11 samples/sec Loss 6.2097 LearningRate 0.0510 Epoch: 5 Global Step: 71020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:50:46,551-Speed 3373.05 samples/sec Loss 6.2753 LearningRate 0.0510 Epoch: 5 Global Step: 71030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:50:49,713-Speed 3238.96 samples/sec Loss 6.3130 LearningRate 0.0510 Epoch: 5 Global Step: 71040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:50:52,801-Speed 3317.12 samples/sec Loss 6.3361 LearningRate 0.0510 Epoch: 5 Global Step: 71050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:50:55,918-Speed 3286.63 samples/sec Loss 6.3311 LearningRate 0.0510 Epoch: 5 Global Step: 71060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:50:58,958-Speed 3369.04 samples/sec Loss 6.3752 LearningRate 0.0510 Epoch: 5 Global Step: 71070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:02,091-Speed 3269.57 samples/sec Loss 6.3832 LearningRate 0.0510 Epoch: 5 Global Step: 71080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:05,197-Speed 3299.56 samples/sec Loss 6.2516 LearningRate 0.0510 Epoch: 5 Global Step: 71090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:08,285-Speed 3316.53 samples/sec Loss 6.3393 LearningRate 0.0509 Epoch: 5 Global Step: 71100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:11,390-Speed 3299.02 samples/sec Loss 6.3372 LearningRate 0.0509 Epoch: 5 Global Step: 71110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:14,520-Speed 3273.39 samples/sec Loss 6.3266 LearningRate 0.0509 Epoch: 5 Global Step: 71120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:51:17,569-Speed 3359.08 samples/sec Loss 6.2834 LearningRate 0.0509 Epoch: 5 Global Step: 71130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:51:20,675-Speed 3298.12 samples/sec Loss 6.3149 LearningRate 0.0509 Epoch: 5 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:51:23,712-Speed 3372.71 samples/sec Loss 6.4149 LearningRate 0.0509 Epoch: 5 Global Step: 71150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:26,792-Speed 3325.69 samples/sec Loss 6.3451 LearningRate 0.0509 Epoch: 5 Global Step: 71160 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:29,877-Speed 3319.86 samples/sec Loss 6.4122 LearningRate 0.0509 Epoch: 5 Global Step: 71170 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:32,969-Speed 3312.99 samples/sec Loss 6.2915 LearningRate 0.0509 Epoch: 5 Global Step: 71180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:36,125-Speed 3245.82 samples/sec Loss 6.3256 LearningRate 0.0509 Epoch: 5 Global Step: 71190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:39,263-Speed 3264.11 samples/sec Loss 6.2986 LearningRate 0.0509 Epoch: 5 Global Step: 71200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:42,353-Speed 3315.29 samples/sec Loss 6.3080 LearningRate 0.0509 Epoch: 5 Global Step: 71210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:45,468-Speed 3288.31 samples/sec Loss 6.2455 LearningRate 0.0509 Epoch: 5 Global Step: 71220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:48,644-Speed 3225.00 samples/sec Loss 6.3497 LearningRate 0.0509 Epoch: 5 Global Step: 71230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:51,845-Speed 3199.63 samples/sec Loss 6.3398 LearningRate 0.0509 Epoch: 5 Global Step: 71240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:51:55,023-Speed 3223.59 samples/sec Loss 6.3982 LearningRate 0.0509 Epoch: 5 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:51:58,093-Speed 3336.25 samples/sec Loss 6.4150 LearningRate 0.0509 Epoch: 5 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:52:01,192-Speed 3305.25 samples/sec Loss 6.2611 LearningRate 0.0509 Epoch: 5 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:52:04,302-Speed 3294.35 samples/sec Loss 6.3350 LearningRate 0.0508 Epoch: 5 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:52:07,383-Speed 3324.98 samples/sec Loss 6.2541 LearningRate 0.0508 Epoch: 5 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:52:10,426-Speed 3365.55 samples/sec Loss 6.4616 LearningRate 0.0508 Epoch: 5 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:52:13,486-Speed 3347.24 samples/sec Loss 6.3031 LearningRate 0.0508 Epoch: 5 Global Step: 71310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:52:16,609-Speed 3279.94 samples/sec Loss 6.4573 LearningRate 0.0508 Epoch: 5 Global Step: 71320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:52:19,670-Speed 3346.48 samples/sec Loss 6.3007 LearningRate 0.0508 Epoch: 5 Global Step: 71330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:52:22,736-Speed 3341.18 samples/sec Loss 6.4501 LearningRate 0.0508 Epoch: 5 Global Step: 71340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:52:25,855-Speed 3283.78 samples/sec Loss 6.3387 LearningRate 0.0508 Epoch: 5 Global Step: 71350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:52:28,973-Speed 3285.73 samples/sec Loss 6.3404 LearningRate 0.0508 Epoch: 5 Global Step: 71360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:52:32,104-Speed 3271.57 samples/sec Loss 6.2265 LearningRate 0.0508 Epoch: 5 Global Step: 71370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:52:35,152-Speed 3360.58 samples/sec Loss 6.3480 LearningRate 0.0508 Epoch: 5 Global Step: 71380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:52:38,272-Speed 3282.34 samples/sec Loss 6.2969 LearningRate 0.0508 Epoch: 5 Global Step: 71390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:52:41,482-Speed 3191.11 samples/sec Loss 6.3485 LearningRate 0.0508 Epoch: 5 Global Step: 71400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:52:44,580-Speed 3306.69 samples/sec Loss 6.2434 LearningRate 0.0508 Epoch: 5 Global Step: 71410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:52:47,715-Speed 3267.06 samples/sec Loss 6.4279 LearningRate 0.0508 Epoch: 5 Global Step: 71420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:52:50,825-Speed 3294.55 samples/sec Loss 6.2345 LearningRate 0.0508 Epoch: 5 Global Step: 71430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:52:53,884-Speed 3347.63 samples/sec Loss 6.3931 LearningRate 0.0508 Epoch: 5 Global Step: 71440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:52:56,978-Speed 3311.32 samples/sec Loss 6.4046 LearningRate 0.0507 Epoch: 5 Global Step: 71450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:53:00,035-Speed 3350.94 samples/sec Loss 6.3182 LearningRate 0.0507 Epoch: 5 Global Step: 71460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:53:03,134-Speed 3304.60 samples/sec Loss 6.4443 LearningRate 0.0507 Epoch: 5 Global Step: 71470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:53:06,214-Speed 3326.13 samples/sec Loss 6.2700 LearningRate 0.0507 Epoch: 5 Global Step: 71480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:53:09,215-Speed 3413.31 samples/sec Loss 6.2995 LearningRate 0.0507 Epoch: 5 Global Step: 71490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:53:12,342-Speed 3275.03 samples/sec Loss 6.2938 LearningRate 0.0507 Epoch: 5 Global Step: 71500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:53:15,467-Speed 3278.64 samples/sec Loss 6.3036 LearningRate 0.0507 Epoch: 5 Global Step: 71510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:53:18,536-Speed 3337.12 samples/sec Loss 6.3691 LearningRate 0.0507 Epoch: 5 Global Step: 71520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:53:21,590-Speed 3353.66 samples/sec Loss 6.2682 LearningRate 0.0507 Epoch: 5 Global Step: 71530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:53:24,690-Speed 3304.79 samples/sec Loss 6.2530 LearningRate 0.0507 Epoch: 5 Global Step: 71540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:53:27,767-Speed 3329.31 samples/sec Loss 6.3906 LearningRate 0.0507 Epoch: 5 Global Step: 71550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:53:30,906-Speed 3262.48 samples/sec Loss 6.3562 LearningRate 0.0507 Epoch: 5 Global Step: 71560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:53:33,961-Speed 3353.68 samples/sec Loss 6.3272 LearningRate 0.0507 Epoch: 5 Global Step: 71570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:53:37,064-Speed 3301.20 samples/sec Loss 6.3495 LearningRate 0.0507 Epoch: 5 Global Step: 71580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:53:40,173-Speed 3294.27 samples/sec Loss 6.4176 LearningRate 0.0507 Epoch: 5 Global Step: 71590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:53:43,271-Speed 3306.47 samples/sec Loss 6.2815 LearningRate 0.0507 Epoch: 5 Global Step: 71600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:53:46,342-Speed 3335.03 samples/sec Loss 6.3148 LearningRate 0.0507 Epoch: 5 Global Step: 71610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:53:49,487-Speed 3257.18 samples/sec Loss 6.1617 LearningRate 0.0507 Epoch: 5 Global Step: 71620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:53:52,610-Speed 3279.76 samples/sec Loss 6.2961 LearningRate 0.0506 Epoch: 5 Global Step: 71630 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:53:55,719-Speed 3295.01 samples/sec Loss 6.2376 LearningRate 0.0506 Epoch: 5 Global Step: 71640 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:53:58,779-Speed 3347.31 samples/sec Loss 6.4514 LearningRate 0.0506 Epoch: 5 Global Step: 71650 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:54:01,879-Speed 3303.95 samples/sec Loss 6.2452 LearningRate 0.0506 Epoch: 5 Global Step: 71660 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:54:05,048-Speed 3232.83 samples/sec Loss 6.3308 LearningRate 0.0506 Epoch: 5 Global Step: 71670 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:54:08,125-Speed 3328.90 samples/sec Loss 6.2047 LearningRate 0.0506 Epoch: 5 Global Step: 71680 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:54:11,257-Speed 3270.26 samples/sec Loss 6.4266 LearningRate 0.0506 Epoch: 5 Global Step: 71690 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:54:14,352-Speed 3310.09 samples/sec Loss 6.2739 LearningRate 0.0506 Epoch: 5 Global Step: 71700 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:54:17,527-Speed 3226.23 samples/sec Loss 6.3337 LearningRate 0.0506 Epoch: 5 Global Step: 71710 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:54:20,624-Speed 3307.02 samples/sec Loss 6.3120 LearningRate 0.0506 Epoch: 5 Global Step: 71720 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:54:23,701-Speed 3329.35 samples/sec Loss 6.2235 LearningRate 0.0506 Epoch: 5 Global Step: 71730 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:54:26,840-Speed 3263.65 samples/sec Loss 6.2666 LearningRate 0.0506 Epoch: 5 Global Step: 71740 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:54:29,932-Speed 3312.52 samples/sec Loss 6.1813 LearningRate 0.0506 Epoch: 5 Global Step: 71750 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:54:32,974-Speed 3367.43 samples/sec Loss 6.2980 LearningRate 0.0506 Epoch: 5 Global Step: 71760 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:54:36,039-Speed 3342.01 samples/sec Loss 6.3193 LearningRate 0.0506 Epoch: 5 Global Step: 71770 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:54:39,175-Speed 3266.10 samples/sec Loss 6.2392 LearningRate 0.0506 Epoch: 5 Global Step: 71780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:54:42,294-Speed 3284.02 samples/sec Loss 6.4058 LearningRate 0.0506 Epoch: 5 Global Step: 71790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:54:45,325-Speed 3379.26 samples/sec Loss 6.1296 LearningRate 0.0505 Epoch: 5 Global Step: 71800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:54:48,393-Speed 3338.87 samples/sec Loss 6.1958 LearningRate 0.0505 Epoch: 5 Global Step: 71810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:54:51,505-Speed 3292.08 samples/sec Loss 6.3196 LearningRate 0.0505 Epoch: 5 Global Step: 71820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:54:54,614-Speed 3294.80 samples/sec Loss 6.3076 LearningRate 0.0505 Epoch: 5 Global Step: 71830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:54:57,675-Speed 3345.73 samples/sec Loss 6.2678 LearningRate 0.0505 Epoch: 5 Global Step: 71840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:55:00,784-Speed 3294.72 samples/sec Loss 6.2466 LearningRate 0.0505 Epoch: 5 Global Step: 71850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:55:03,876-Speed 3313.55 samples/sec Loss 6.3136 LearningRate 0.0505 Epoch: 5 Global Step: 71860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:55:06,962-Speed 3318.83 samples/sec Loss 6.2742 LearningRate 0.0505 Epoch: 5 Global Step: 71870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:55:10,029-Speed 3340.66 samples/sec Loss 6.2142 LearningRate 0.0505 Epoch: 5 Global Step: 71880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:55:13,106-Speed 3328.32 samples/sec Loss 6.3531 LearningRate 0.0505 Epoch: 5 Global Step: 71890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:55:16,182-Speed 3330.77 samples/sec Loss 6.3577 LearningRate 0.0505 Epoch: 5 Global Step: 71900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:55:19,248-Speed 3340.61 samples/sec Loss 6.3313 LearningRate 0.0505 Epoch: 5 Global Step: 71910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:55:22,295-Speed 3362.36 samples/sec Loss 6.2683 LearningRate 0.0505 Epoch: 5 Global Step: 71920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:55:25,380-Speed 3320.07 samples/sec Loss 6.3461 LearningRate 0.0505 Epoch: 5 Global Step: 71930 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:55:28,412-Speed 3378.86 samples/sec Loss 6.2236 LearningRate 0.0505 Epoch: 5 Global Step: 71940 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:55:31,462-Speed 3359.01 samples/sec Loss 6.3236 LearningRate 0.0505 Epoch: 5 Global Step: 71950 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:55:34,516-Speed 3353.79 samples/sec Loss 6.3273 LearningRate 0.0505 Epoch: 5 Global Step: 71960 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:55:37,604-Speed 3317.56 samples/sec Loss 6.3423 LearningRate 0.0505 Epoch: 5 Global Step: 71970 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:55:40,670-Speed 3340.25 samples/sec Loss 6.2393 LearningRate 0.0504 Epoch: 5 Global Step: 71980 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:55:43,759-Speed 3315.47 samples/sec Loss 6.3654 LearningRate 0.0504 Epoch: 5 Global Step: 71990 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:55:46,806-Speed 3362.32 samples/sec Loss 6.2894 LearningRate 0.0504 Epoch: 5 Global Step: 72000 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:55:49,931-Speed 3277.65 samples/sec Loss 6.4130 LearningRate 0.0504 Epoch: 5 Global Step: 72010 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:55:53,041-Speed 3293.73 samples/sec Loss 6.4319 LearningRate 0.0504 Epoch: 5 Global Step: 72020 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:55:56,141-Speed 3306.42 samples/sec Loss 6.4728 LearningRate 0.0504 Epoch: 5 Global Step: 72030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:55:59,225-Speed 3321.00 samples/sec Loss 6.3150 LearningRate 0.0504 Epoch: 5 Global Step: 72040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:02,312-Speed 3318.71 samples/sec Loss 6.2894 LearningRate 0.0504 Epoch: 5 Global Step: 72050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:05,425-Speed 3290.37 samples/sec Loss 6.4020 LearningRate 0.0504 Epoch: 5 Global Step: 72060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:08,494-Speed 3337.84 samples/sec Loss 6.2806 LearningRate 0.0504 Epoch: 5 Global Step: 72070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:11,624-Speed 3272.24 samples/sec Loss 6.2552 LearningRate 0.0504 Epoch: 5 Global Step: 72080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:14,780-Speed 3245.01 samples/sec Loss 6.4009 LearningRate 0.0504 Epoch: 5 Global Step: 72090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:17,965-Speed 3216.57 samples/sec Loss 6.2258 LearningRate 0.0504 Epoch: 5 Global Step: 72100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:21,028-Speed 3344.14 samples/sec Loss 6.3147 LearningRate 0.0504 Epoch: 5 Global Step: 72110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:24,110-Speed 3323.81 samples/sec Loss 6.2886 LearningRate 0.0504 Epoch: 5 Global Step: 72120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:27,225-Speed 3288.83 samples/sec Loss 6.3037 LearningRate 0.0504 Epoch: 5 Global Step: 72130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:56:30,286-Speed 3346.00 samples/sec Loss 6.3011 LearningRate 0.0504 Epoch: 5 Global Step: 72140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:56:33,333-Speed 3361.73 samples/sec Loss 6.3302 LearningRate 0.0503 Epoch: 5 Global Step: 72150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:56:36,398-Speed 3341.64 samples/sec Loss 6.3550 LearningRate 0.0503 Epoch: 5 Global Step: 72160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:56:39,511-Speed 3290.65 samples/sec Loss 6.2910 LearningRate 0.0503 Epoch: 5 Global Step: 72170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:56:42,637-Speed 3277.00 samples/sec Loss 6.2859 LearningRate 0.0503 Epoch: 5 Global Step: 72180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:45,700-Speed 3343.96 samples/sec Loss 6.3600 LearningRate 0.0503 Epoch: 5 Global Step: 72190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:48,773-Speed 3333.82 samples/sec Loss 6.2625 LearningRate 0.0503 Epoch: 5 Global Step: 72200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:51,884-Speed 3291.55 samples/sec Loss 6.3771 LearningRate 0.0503 Epoch: 5 Global Step: 72210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:55,079-Speed 3207.22 samples/sec Loss 6.2553 LearningRate 0.0503 Epoch: 5 Global Step: 72220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:56:58,136-Speed 3350.30 samples/sec Loss 6.2683 LearningRate 0.0503 Epoch: 5 Global Step: 72230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:57:01,183-Speed 3361.50 samples/sec Loss 6.2306 LearningRate 0.0503 Epoch: 5 Global Step: 72240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:57:04,325-Speed 3259.61 samples/sec Loss 6.3706 LearningRate 0.0503 Epoch: 5 Global Step: 72250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:57:07,408-Speed 3323.11 samples/sec Loss 6.2289 LearningRate 0.0503 Epoch: 5 Global Step: 72260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:57:10,462-Speed 3354.10 samples/sec Loss 6.3350 LearningRate 0.0503 Epoch: 5 Global Step: 72270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:57:13,530-Speed 3337.65 samples/sec Loss 6.2897 LearningRate 0.0503 Epoch: 5 Global Step: 72280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:57:16,666-Speed 3266.41 samples/sec Loss 6.3339 LearningRate 0.0503 Epoch: 5 Global Step: 72290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 07:57:19,800-Speed 3268.68 samples/sec Loss 6.2252 LearningRate 0.0503 Epoch: 5 Global Step: 72300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:57:22,906-Speed 3298.26 samples/sec Loss 6.2570 LearningRate 0.0503 Epoch: 5 Global Step: 72310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:57:25,969-Speed 3343.81 samples/sec Loss 6.2935 LearningRate 0.0503 Epoch: 5 Global Step: 72320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:57:29,069-Speed 3304.84 samples/sec Loss 6.2629 LearningRate 0.0502 Epoch: 5 Global Step: 72330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:57:32,140-Speed 3334.92 samples/sec Loss 6.2972 LearningRate 0.0502 Epoch: 5 Global Step: 72340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:57:35,211-Speed 3335.54 samples/sec Loss 6.2868 LearningRate 0.0502 Epoch: 5 Global Step: 72350 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:57:38,284-Speed 3333.00 samples/sec Loss 6.2205 LearningRate 0.0502 Epoch: 5 Global Step: 72360 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:57:41,356-Speed 3334.41 samples/sec Loss 6.3646 LearningRate 0.0502 Epoch: 5 Global Step: 72370 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:57:44,424-Speed 3339.19 samples/sec Loss 6.3295 LearningRate 0.0502 Epoch: 5 Global Step: 72380 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:57:47,530-Speed 3297.71 samples/sec Loss 6.2792 LearningRate 0.0502 Epoch: 5 Global Step: 72390 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:57:50,623-Speed 3312.08 samples/sec Loss 6.3407 LearningRate 0.0502 Epoch: 5 Global Step: 72400 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:57:53,697-Speed 3332.16 samples/sec Loss 6.3138 LearningRate 0.0502 Epoch: 5 Global Step: 72410 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:57:56,776-Speed 3326.53 samples/sec Loss 6.2695 LearningRate 0.0502 Epoch: 5 Global Step: 72420 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:57:59,929-Speed 3248.50 samples/sec Loss 6.2568 LearningRate 0.0502 Epoch: 5 Global Step: 72430 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:03,090-Speed 3240.36 samples/sec Loss 6.2650 LearningRate 0.0502 Epoch: 5 Global Step: 72440 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:06,192-Speed 3301.83 samples/sec Loss 6.4949 LearningRate 0.0502 Epoch: 5 Global Step: 72450 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:09,309-Speed 3286.71 samples/sec Loss 6.2381 LearningRate 0.0502 Epoch: 5 Global Step: 72460 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:12,382-Speed 3332.60 samples/sec Loss 6.3514 LearningRate 0.0502 Epoch: 5 Global Step: 72470 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:15,586-Speed 3197.18 samples/sec Loss 6.2856 LearningRate 0.0502 Epoch: 5 Global Step: 72480 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:18,674-Speed 3316.76 samples/sec Loss 6.2849 LearningRate 0.0502 Epoch: 5 Global Step: 72490 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:21,760-Speed 3319.55 samples/sec Loss 6.3438 LearningRate 0.0501 Epoch: 5 Global Step: 72500 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:24,880-Speed 3282.94 samples/sec Loss 6.2794 LearningRate 0.0501 Epoch: 5 Global Step: 72510 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:28,017-Speed 3265.25 samples/sec Loss 6.3387 LearningRate 0.0501 Epoch: 5 Global Step: 72520 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:31,187-Speed 3230.94 samples/sec Loss 6.3165 LearningRate 0.0501 Epoch: 5 Global Step: 72530 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:34,240-Speed 3355.93 samples/sec Loss 6.2784 LearningRate 0.0501 Epoch: 5 Global Step: 72540 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:37,408-Speed 3232.76 samples/sec Loss 6.2534 LearningRate 0.0501 Epoch: 5 Global Step: 72550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:58:40,492-Speed 3321.46 samples/sec Loss 6.2200 LearningRate 0.0501 Epoch: 5 Global Step: 72560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:58:43,592-Speed 3304.22 samples/sec Loss 6.1647 LearningRate 0.0501 Epoch: 5 Global Step: 72570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:58:46,676-Speed 3321.66 samples/sec Loss 6.3135 LearningRate 0.0501 Epoch: 5 Global Step: 72580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:58:49,780-Speed 3299.51 samples/sec Loss 6.3593 LearningRate 0.0501 Epoch: 5 Global Step: 72590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:58:52,888-Speed 3295.43 samples/sec Loss 6.3034 LearningRate 0.0501 Epoch: 5 Global Step: 72600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:58:55,947-Speed 3349.08 samples/sec Loss 6.2809 LearningRate 0.0501 Epoch: 5 Global Step: 72610 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:58:59,029-Speed 3323.84 samples/sec Loss 6.2639 LearningRate 0.0501 Epoch: 5 Global Step: 72620 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:59:02,220-Speed 3209.74 samples/sec Loss 6.3119 LearningRate 0.0501 Epoch: 5 Global Step: 72630 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:59:05,335-Speed 3288.81 samples/sec Loss 6.3411 LearningRate 0.0501 Epoch: 5 Global Step: 72640 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:59:08,417-Speed 3323.72 samples/sec Loss 6.3416 LearningRate 0.0501 Epoch: 5 Global Step: 72650 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:59:11,499-Speed 3323.53 samples/sec Loss 6.2724 LearningRate 0.0501 Epoch: 5 Global Step: 72660 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:59:14,617-Speed 3284.77 samples/sec Loss 6.2588 LearningRate 0.0501 Epoch: 5 Global Step: 72670 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:59:17,791-Speed 3228.06 samples/sec Loss 6.2634 LearningRate 0.0500 Epoch: 5 Global Step: 72680 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:59:20,859-Speed 3337.92 samples/sec Loss 6.2916 LearningRate 0.0500 Epoch: 5 Global Step: 72690 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:59:23,986-Speed 3276.36 samples/sec Loss 6.3505 LearningRate 0.0500 Epoch: 5 Global Step: 72700 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 07:59:27,153-Speed 3234.22 samples/sec Loss 6.4173 LearningRate 0.0500 Epoch: 5 Global Step: 72710 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:59:30,244-Speed 3313.71 samples/sec Loss 6.3127 LearningRate 0.0500 Epoch: 5 Global Step: 72720 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:59:33,327-Speed 3321.75 samples/sec Loss 6.3922 LearningRate 0.0500 Epoch: 5 Global Step: 72730 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:59:36,376-Speed 3360.69 samples/sec Loss 6.3293 LearningRate 0.0500 Epoch: 5 Global Step: 72740 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:59:39,482-Speed 3297.27 samples/sec Loss 6.3645 LearningRate 0.0500 Epoch: 5 Global Step: 72750 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:59:42,589-Speed 3296.28 samples/sec Loss 6.3261 LearningRate 0.0500 Epoch: 5 Global Step: 72760 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:59:45,659-Speed 3337.01 samples/sec Loss 6.3679 LearningRate 0.0500 Epoch: 5 Global Step: 72770 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:59:48,790-Speed 3271.26 samples/sec Loss 6.2195 LearningRate 0.0500 Epoch: 5 Global Step: 72780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:59:51,954-Speed 3237.19 samples/sec Loss 6.3138 LearningRate 0.0500 Epoch: 5 Global Step: 72790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:59:55,053-Speed 3305.76 samples/sec Loss 6.2513 LearningRate 0.0500 Epoch: 5 Global Step: 72800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 07:59:58,090-Speed 3372.46 samples/sec Loss 6.1813 LearningRate 0.0500 Epoch: 5 Global Step: 72810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:01,140-Speed 3358.69 samples/sec Loss 6.2094 LearningRate 0.0500 Epoch: 5 Global Step: 72820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:04,208-Speed 3338.41 samples/sec Loss 6.2608 LearningRate 0.0500 Epoch: 5 Global Step: 72830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:07,269-Speed 3347.49 samples/sec Loss 6.3219 LearningRate 0.0500 Epoch: 5 Global Step: 72840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:10,327-Speed 3349.65 samples/sec Loss 6.3747 LearningRate 0.0499 Epoch: 5 Global Step: 72850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:13,437-Speed 3292.52 samples/sec Loss 6.2975 LearningRate 0.0499 Epoch: 5 Global Step: 72860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:16,507-Speed 3336.34 samples/sec Loss 6.2618 LearningRate 0.0499 Epoch: 5 Global Step: 72870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:19,585-Speed 3328.21 samples/sec Loss 6.2139 LearningRate 0.0499 Epoch: 5 Global Step: 72880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:22,677-Speed 3312.72 samples/sec Loss 6.2852 LearningRate 0.0499 Epoch: 5 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:25,794-Speed 3286.47 samples/sec Loss 6.3636 LearningRate 0.0499 Epoch: 5 Global Step: 72900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:28,856-Speed 3345.50 samples/sec Loss 6.2480 LearningRate 0.0499 Epoch: 5 Global Step: 72910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 08:00:31,980-Speed 3279.10 samples/sec Loss 6.1731 LearningRate 0.0499 Epoch: 5 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:35,169-Speed 3212.08 samples/sec Loss 6.3054 LearningRate 0.0499 Epoch: 5 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:00:38,269-Speed 3304.84 samples/sec Loss 6.3081 LearningRate 0.0499 Epoch: 5 Global Step: 72940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:00:41,397-Speed 3274.23 samples/sec Loss 6.2684 LearningRate 0.0499 Epoch: 5 Global Step: 72950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:00:44,461-Speed 3343.08 samples/sec Loss 6.2990 LearningRate 0.0499 Epoch: 5 Global Step: 72960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:00:47,533-Speed 3334.19 samples/sec Loss 6.1646 LearningRate 0.0499 Epoch: 5 Global Step: 72970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:00:50,641-Speed 3295.78 samples/sec Loss 6.3666 LearningRate 0.0499 Epoch: 5 Global Step: 72980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:00:53,763-Speed 3281.75 samples/sec Loss 6.3265 LearningRate 0.0499 Epoch: 5 Global Step: 72990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:00:56,823-Speed 3347.51 samples/sec Loss 6.2656 LearningRate 0.0499 Epoch: 5 Global Step: 73000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:00:59,891-Speed 3338.83 samples/sec Loss 6.1653 LearningRate 0.0499 Epoch: 5 Global Step: 73010 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:01:03,157-Speed 3136.56 samples/sec Loss 6.4417 LearningRate 0.0499 Epoch: 5 Global Step: 73020 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:01:06,243-Speed 3318.77 samples/sec Loss 6.3665 LearningRate 0.0498 Epoch: 5 Global Step: 73030 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:01:09,298-Speed 3353.17 samples/sec Loss 6.2862 LearningRate 0.0498 Epoch: 5 Global Step: 73040 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:01:12,474-Speed 3225.24 samples/sec Loss 6.3002 LearningRate 0.0498 Epoch: 5 Global Step: 73050 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:01:15,587-Speed 3290.28 samples/sec Loss 6.3379 LearningRate 0.0498 Epoch: 5 Global Step: 73060 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:01:18,685-Speed 3306.72 samples/sec Loss 6.2308 LearningRate 0.0498 Epoch: 5 Global Step: 73070 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:01:21,781-Speed 3308.50 samples/sec Loss 6.3083 LearningRate 0.0498 Epoch: 5 Global Step: 73080 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:01:24,923-Speed 3260.12 samples/sec Loss 6.3065 LearningRate 0.0498 Epoch: 5 Global Step: 73090 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:01:28,007-Speed 3321.17 samples/sec Loss 6.3973 LearningRate 0.0498 Epoch: 5 Global Step: 73100 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:01:31,082-Speed 3331.88 samples/sec Loss 6.2530 LearningRate 0.0498 Epoch: 5 Global Step: 73110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:01:34,181-Speed 3305.33 samples/sec Loss 6.3576 LearningRate 0.0498 Epoch: 5 Global Step: 73120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:01:37,336-Speed 3246.98 samples/sec Loss 6.2302 LearningRate 0.0498 Epoch: 5 Global Step: 73130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:01:40,449-Speed 3290.33 samples/sec Loss 6.2738 LearningRate 0.0498 Epoch: 5 Global Step: 73140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:01:43,599-Speed 3251.30 samples/sec Loss 6.2988 LearningRate 0.0498 Epoch: 5 Global Step: 73150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:01:46,666-Speed 3339.91 samples/sec Loss 6.0998 LearningRate 0.0498 Epoch: 5 Global Step: 73160 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:01:49,736-Speed 3337.14 samples/sec Loss 6.2387 LearningRate 0.0498 Epoch: 5 Global Step: 73170 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:01:52,832-Speed 3308.52 samples/sec Loss 6.2364 LearningRate 0.0498 Epoch: 5 Global Step: 73180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:01:55,951-Speed 3284.45 samples/sec Loss 6.2026 LearningRate 0.0498 Epoch: 5 Global Step: 73190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:01:59,033-Speed 3322.64 samples/sec Loss 6.3697 LearningRate 0.0498 Epoch: 5 Global Step: 73200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:02,237-Speed 3196.97 samples/sec Loss 6.2902 LearningRate 0.0497 Epoch: 5 Global Step: 73210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:02:05,369-Speed 3270.96 samples/sec Loss 6.2131 LearningRate 0.0497 Epoch: 5 Global Step: 73220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:02:08,459-Speed 3314.37 samples/sec Loss 6.1600 LearningRate 0.0497 Epoch: 5 Global Step: 73230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:11,548-Speed 3317.57 samples/sec Loss 6.2844 LearningRate 0.0497 Epoch: 5 Global Step: 73240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:14,730-Speed 3218.34 samples/sec Loss 6.3148 LearningRate 0.0497 Epoch: 5 Global Step: 73250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:17,837-Speed 3296.40 samples/sec Loss 6.3010 LearningRate 0.0497 Epoch: 5 Global Step: 73260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:20,944-Speed 3296.73 samples/sec Loss 6.3671 LearningRate 0.0497 Epoch: 5 Global Step: 73270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:24,007-Speed 3344.60 samples/sec Loss 6.2645 LearningRate 0.0497 Epoch: 5 Global Step: 73280 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:27,101-Speed 3310.97 samples/sec Loss 6.2652 LearningRate 0.0497 Epoch: 5 Global Step: 73290 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:30,202-Speed 3303.51 samples/sec Loss 6.3802 LearningRate 0.0497 Epoch: 5 Global Step: 73300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:33,310-Speed 3295.30 samples/sec Loss 6.3517 LearningRate 0.0497 Epoch: 5 Global Step: 73310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:36,421-Speed 3292.54 samples/sec Loss 6.2593 LearningRate 0.0497 Epoch: 5 Global Step: 73320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:39,513-Speed 3312.05 samples/sec Loss 6.3278 LearningRate 0.0497 Epoch: 5 Global Step: 73330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:02:42,630-Speed 3286.05 samples/sec Loss 6.2773 LearningRate 0.0497 Epoch: 5 Global Step: 73340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:02:45,698-Speed 3339.59 samples/sec Loss 6.2582 LearningRate 0.0497 Epoch: 5 Global Step: 73350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:02:48,778-Speed 3325.08 samples/sec Loss 6.2736 LearningRate 0.0497 Epoch: 5 Global Step: 73360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:51,934-Speed 3245.77 samples/sec Loss 6.2241 LearningRate 0.0497 Epoch: 5 Global Step: 73370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:54,997-Speed 3344.06 samples/sec Loss 6.2900 LearningRate 0.0496 Epoch: 5 Global Step: 73380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:02:58,067-Speed 3336.69 samples/sec Loss 6.2165 LearningRate 0.0496 Epoch: 5 Global Step: 73390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:03:01,214-Speed 3255.24 samples/sec Loss 6.4148 LearningRate 0.0496 Epoch: 5 Global Step: 73400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:03:04,324-Speed 3293.33 samples/sec Loss 6.2760 LearningRate 0.0496 Epoch: 5 Global Step: 73410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:03:07,476-Speed 3249.55 samples/sec Loss 6.2413 LearningRate 0.0496 Epoch: 5 Global Step: 73420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:03:10,533-Speed 3350.20 samples/sec Loss 6.2680 LearningRate 0.0496 Epoch: 5 Global Step: 73430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:03:13,626-Speed 3311.78 samples/sec Loss 6.1916 LearningRate 0.0496 Epoch: 5 Global Step: 73440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:03:16,718-Speed 3313.21 samples/sec Loss 6.1366 LearningRate 0.0496 Epoch: 5 Global Step: 73450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:03:19,836-Speed 3285.30 samples/sec Loss 6.2282 LearningRate 0.0496 Epoch: 5 Global Step: 73460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:03:22,936-Speed 3304.25 samples/sec Loss 6.1743 LearningRate 0.0496 Epoch: 5 Global Step: 73470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:03:26,048-Speed 3290.62 samples/sec Loss 6.2259 LearningRate 0.0496 Epoch: 5 Global Step: 73480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:03:29,146-Speed 3307.20 samples/sec Loss 6.2915 LearningRate 0.0496 Epoch: 5 Global Step: 73490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:03:32,207-Speed 3346.57 samples/sec Loss 6.1911 LearningRate 0.0496 Epoch: 5 Global Step: 73500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:03:35,279-Speed 3334.07 samples/sec Loss 6.3540 LearningRate 0.0496 Epoch: 5 Global Step: 73510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:03:38,421-Speed 3260.11 samples/sec Loss 6.3235 LearningRate 0.0496 Epoch: 5 Global Step: 73520 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:03:41,569-Speed 3253.14 samples/sec Loss 6.2505 LearningRate 0.0496 Epoch: 5 Global Step: 73530 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:03:44,691-Speed 3280.67 samples/sec Loss 6.3697 LearningRate 0.0496 Epoch: 5 Global Step: 73540 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:03:47,882-Speed 3210.93 samples/sec Loss 6.2254 LearningRate 0.0496 Epoch: 5 Global Step: 73550 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:03:51,020-Speed 3263.54 samples/sec Loss 6.2147 LearningRate 0.0495 Epoch: 5 Global Step: 73560 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:03:54,137-Speed 3286.94 samples/sec Loss 6.3621 LearningRate 0.0495 Epoch: 5 Global Step: 73570 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:03:57,241-Speed 3300.13 samples/sec Loss 6.2853 LearningRate 0.0495 Epoch: 5 Global Step: 73580 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:04:00,376-Speed 3267.27 samples/sec Loss 6.3387 LearningRate 0.0495 Epoch: 5 Global Step: 73590 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:04:03,440-Speed 3343.04 samples/sec Loss 6.2433 LearningRate 0.0495 Epoch: 5 Global Step: 73600 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:04:06,617-Speed 3224.79 samples/sec Loss 6.2322 LearningRate 0.0495 Epoch: 5 Global Step: 73610 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:04:09,704-Speed 3318.04 samples/sec Loss 6.1974 LearningRate 0.0495 Epoch: 5 Global Step: 73620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:12,843-Speed 3263.02 samples/sec Loss 6.2891 LearningRate 0.0495 Epoch: 5 Global Step: 73630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:16,033-Speed 3210.91 samples/sec Loss 6.3169 LearningRate 0.0495 Epoch: 5 Global Step: 73640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:19,145-Speed 3292.02 samples/sec Loss 6.2347 LearningRate 0.0495 Epoch: 5 Global Step: 73650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:22,211-Speed 3341.00 samples/sec Loss 6.2239 LearningRate 0.0495 Epoch: 5 Global Step: 73660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:25,288-Speed 3328.68 samples/sec Loss 6.2423 LearningRate 0.0495 Epoch: 5 Global Step: 73670 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:28,473-Speed 3216.71 samples/sec Loss 6.3270 LearningRate 0.0495 Epoch: 5 Global Step: 73680 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:31,568-Speed 3309.69 samples/sec Loss 6.2402 LearningRate 0.0495 Epoch: 5 Global Step: 73690 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:34,622-Speed 3353.92 samples/sec Loss 6.3277 LearningRate 0.0495 Epoch: 5 Global Step: 73700 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:37,731-Speed 3293.99 samples/sec Loss 6.3248 LearningRate 0.0495 Epoch: 5 Global Step: 73710 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:40,822-Speed 3314.10 samples/sec Loss 6.2115 LearningRate 0.0495 Epoch: 5 Global Step: 73720 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:43,890-Speed 3339.30 samples/sec Loss 6.2974 LearningRate 0.0494 Epoch: 5 Global Step: 73730 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:46,995-Speed 3299.01 samples/sec Loss 6.2707 LearningRate 0.0494 Epoch: 5 Global Step: 73740 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:50,078-Speed 3322.35 samples/sec Loss 6.1733 LearningRate 0.0494 Epoch: 5 Global Step: 73750 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:53,147-Speed 3338.30 samples/sec Loss 6.2295 LearningRate 0.0494 Epoch: 5 Global Step: 73760 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:56,229-Speed 3323.61 samples/sec Loss 6.1850 LearningRate 0.0494 Epoch: 5 Global Step: 73770 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:04:59,264-Speed 3374.98 samples/sec Loss 6.2945 LearningRate 0.0494 Epoch: 5 Global Step: 73780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:02,352-Speed 3317.22 samples/sec Loss 6.2407 LearningRate 0.0494 Epoch: 5 Global Step: 73790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:05,441-Speed 3316.26 samples/sec Loss 6.3133 LearningRate 0.0494 Epoch: 5 Global Step: 73800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:08,519-Speed 3327.70 samples/sec Loss 6.2973 LearningRate 0.0494 Epoch: 5 Global Step: 73810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:11,636-Speed 3286.16 samples/sec Loss 6.3923 LearningRate 0.0494 Epoch: 5 Global Step: 73820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:14,726-Speed 3314.87 samples/sec Loss 6.3285 LearningRate 0.0494 Epoch: 5 Global Step: 73830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:17,815-Speed 3316.48 samples/sec Loss 6.3933 LearningRate 0.0494 Epoch: 5 Global Step: 73840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:20,882-Speed 3339.19 samples/sec Loss 6.3440 LearningRate 0.0494 Epoch: 5 Global Step: 73850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:23,942-Speed 3347.72 samples/sec Loss 6.3097 LearningRate 0.0494 Epoch: 5 Global Step: 73860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:27,024-Speed 3323.83 samples/sec Loss 6.2182 LearningRate 0.0494 Epoch: 5 Global Step: 73870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:30,146-Speed 3281.12 samples/sec Loss 6.3199 LearningRate 0.0494 Epoch: 5 Global Step: 73880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:33,203-Speed 3350.51 samples/sec Loss 6.2989 LearningRate 0.0494 Epoch: 5 Global Step: 73890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:36,305-Speed 3301.83 samples/sec Loss 6.0992 LearningRate 0.0494 Epoch: 5 Global Step: 73900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:39,407-Speed 3302.26 samples/sec Loss 6.2055 LearningRate 0.0493 Epoch: 5 Global Step: 73910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:42,495-Speed 3317.16 samples/sec Loss 6.1223 LearningRate 0.0493 Epoch: 5 Global Step: 73920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:05:45,541-Speed 3363.20 samples/sec Loss 6.2764 LearningRate 0.0493 Epoch: 5 Global Step: 73930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:48,610-Speed 3337.61 samples/sec Loss 6.2793 LearningRate 0.0493 Epoch: 5 Global Step: 73940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:51,726-Speed 3287.91 samples/sec Loss 6.3196 LearningRate 0.0493 Epoch: 5 Global Step: 73950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:54,789-Speed 3344.03 samples/sec Loss 6.2065 LearningRate 0.0493 Epoch: 5 Global Step: 73960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:05:57,875-Speed 3318.63 samples/sec Loss 6.3523 LearningRate 0.0493 Epoch: 5 Global Step: 73970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:01,001-Speed 3277.61 samples/sec Loss 6.3104 LearningRate 0.0493 Epoch: 5 Global Step: 73980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:04,087-Speed 3318.86 samples/sec Loss 6.2612 LearningRate 0.0493 Epoch: 5 Global Step: 73990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:07,209-Speed 3281.78 samples/sec Loss 6.2482 LearningRate 0.0493 Epoch: 5 Global Step: 74000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:10,280-Speed 3335.20 samples/sec Loss 6.2392 LearningRate 0.0493 Epoch: 5 Global Step: 74010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:13,378-Speed 3306.74 samples/sec Loss 6.2243 LearningRate 0.0493 Epoch: 5 Global Step: 74020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:16,431-Speed 3355.49 samples/sec Loss 6.2284 LearningRate 0.0493 Epoch: 5 Global Step: 74030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:06:19,493-Speed 3345.49 samples/sec Loss 6.2381 LearningRate 0.0493 Epoch: 5 Global Step: 74040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:06:22,580-Speed 3317.19 samples/sec Loss 6.3222 LearningRate 0.0493 Epoch: 5 Global Step: 74050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:06:25,690-Speed 3294.12 samples/sec Loss 6.2339 LearningRate 0.0493 Epoch: 5 Global Step: 74060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:06:28,908-Speed 3182.67 samples/sec Loss 6.1948 LearningRate 0.0493 Epoch: 5 Global Step: 74070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:06:32,044-Speed 3266.32 samples/sec Loss 6.3048 LearningRate 0.0493 Epoch: 5 Global Step: 74080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:06:35,152-Speed 3295.99 samples/sec Loss 6.3242 LearningRate 0.0492 Epoch: 5 Global Step: 74090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:06:38,225-Speed 3332.97 samples/sec Loss 6.2989 LearningRate 0.0492 Epoch: 5 Global Step: 74100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:06:41,369-Speed 3258.98 samples/sec Loss 6.2322 LearningRate 0.0492 Epoch: 5 Global Step: 74110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:44,499-Speed 3272.20 samples/sec Loss 6.2936 LearningRate 0.0492 Epoch: 5 Global Step: 74120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:47,615-Speed 3287.22 samples/sec Loss 6.2197 LearningRate 0.0492 Epoch: 5 Global Step: 74130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:50,674-Speed 3349.02 samples/sec Loss 6.2669 LearningRate 0.0492 Epoch: 5 Global Step: 74140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:53,795-Speed 3281.87 samples/sec Loss 6.2111 LearningRate 0.0492 Epoch: 5 Global Step: 74150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:56,856-Speed 3345.46 samples/sec Loss 6.2244 LearningRate 0.0492 Epoch: 5 Global Step: 74160 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:06:59,944-Speed 3318.32 samples/sec Loss 6.1487 LearningRate 0.0492 Epoch: 5 Global Step: 74170 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:03,065-Speed 3281.36 samples/sec Loss 6.2687 LearningRate 0.0492 Epoch: 5 Global Step: 74180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:06,253-Speed 3213.50 samples/sec Loss 6.2501 LearningRate 0.0492 Epoch: 5 Global Step: 74190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:09,312-Speed 3348.76 samples/sec Loss 6.3486 LearningRate 0.0492 Epoch: 5 Global Step: 74200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:12,474-Speed 3239.06 samples/sec Loss 6.1878 LearningRate 0.0492 Epoch: 5 Global Step: 74210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:07:15,521-Speed 3362.19 samples/sec Loss 6.1210 LearningRate 0.0492 Epoch: 5 Global Step: 74220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:07:18,689-Speed 3233.07 samples/sec Loss 6.2695 LearningRate 0.0492 Epoch: 5 Global Step: 74230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:07:21,773-Speed 3321.88 samples/sec Loss 6.2484 LearningRate 0.0492 Epoch: 5 Global Step: 74240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:07:24,831-Speed 3349.38 samples/sec Loss 6.2515 LearningRate 0.0492 Epoch: 5 Global Step: 74250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:27,896-Speed 3342.69 samples/sec Loss 6.1832 LearningRate 0.0492 Epoch: 5 Global Step: 74260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:31,038-Speed 3259.83 samples/sec Loss 6.2729 LearningRate 0.0491 Epoch: 5 Global Step: 74270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:34,135-Speed 3307.69 samples/sec Loss 6.3415 LearningRate 0.0491 Epoch: 5 Global Step: 74280 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:37,244-Speed 3294.82 samples/sec Loss 6.3540 LearningRate 0.0491 Epoch: 5 Global Step: 74290 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:40,307-Speed 3343.67 samples/sec Loss 6.2442 LearningRate 0.0491 Epoch: 5 Global Step: 74300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:43,450-Speed 3259.37 samples/sec Loss 6.2128 LearningRate 0.0491 Epoch: 5 Global Step: 74310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:46,507-Speed 3351.53 samples/sec Loss 6.2052 LearningRate 0.0491 Epoch: 5 Global Step: 74320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:49,634-Speed 3274.90 samples/sec Loss 6.2251 LearningRate 0.0491 Epoch: 5 Global Step: 74330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:52,772-Speed 3264.05 samples/sec Loss 6.2853 LearningRate 0.0491 Epoch: 5 Global Step: 74340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:07:55,887-Speed 3288.73 samples/sec Loss 6.2365 LearningRate 0.0491 Epoch: 5 Global Step: 74350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:07:58,964-Speed 3329.43 samples/sec Loss 6.2190 LearningRate 0.0491 Epoch: 5 Global Step: 74360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:08:02,099-Speed 3266.76 samples/sec Loss 6.3035 LearningRate 0.0491 Epoch: 5 Global Step: 74370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:08:05,286-Speed 3214.79 samples/sec Loss 6.2790 LearningRate 0.0491 Epoch: 5 Global Step: 74380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:08:08,364-Speed 3327.05 samples/sec Loss 6.1136 LearningRate 0.0491 Epoch: 5 Global Step: 74390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:08:11,432-Speed 3339.75 samples/sec Loss 6.1470 LearningRate 0.0491 Epoch: 5 Global Step: 74400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:08:14,511-Speed 3326.74 samples/sec Loss 6.2829 LearningRate 0.0491 Epoch: 5 Global Step: 74410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:08:17,543-Speed 3378.26 samples/sec Loss 6.2237 LearningRate 0.0491 Epoch: 5 Global Step: 74420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:08:20,628-Speed 3320.50 samples/sec Loss 6.2684 LearningRate 0.0491 Epoch: 5 Global Step: 74430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:08:23,780-Speed 3249.16 samples/sec Loss 6.2244 LearningRate 0.0490 Epoch: 5 Global Step: 74440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:08:26,897-Speed 3285.77 samples/sec Loss 6.2075 LearningRate 0.0490 Epoch: 5 Global Step: 74450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:08:30,031-Speed 3268.74 samples/sec Loss 6.1391 LearningRate 0.0490 Epoch: 5 Global Step: 74460 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:08:33,164-Speed 3269.71 samples/sec Loss 6.2403 LearningRate 0.0490 Epoch: 5 Global Step: 74470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:08:36,350-Speed 3215.17 samples/sec Loss 6.1697 LearningRate 0.0490 Epoch: 5 Global Step: 74480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:08:39,458-Speed 3296.30 samples/sec Loss 6.2027 LearningRate 0.0490 Epoch: 5 Global Step: 74490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:08:42,570-Speed 3291.75 samples/sec Loss 6.2742 LearningRate 0.0490 Epoch: 5 Global Step: 74500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:08:45,625-Speed 3352.65 samples/sec Loss 6.1386 LearningRate 0.0490 Epoch: 5 Global Step: 74510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:08:48,917-Speed 3111.30 samples/sec Loss 6.3432 LearningRate 0.0490 Epoch: 5 Global Step: 74520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:09:20,724-Speed 321.97 samples/sec Loss 5.6774 LearningRate 0.0490 Epoch: 6 Global Step: 74530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:09:24,368-Speed 2810.78 samples/sec Loss 4.9113 LearningRate 0.0490 Epoch: 6 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:09:27,511-Speed 3259.13 samples/sec Loss 4.8665 LearningRate 0.0490 Epoch: 6 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:09:30,549-Speed 3373.01 samples/sec Loss 4.8183 LearningRate 0.0490 Epoch: 6 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:09:33,556-Speed 3405.59 samples/sec Loss 4.7501 LearningRate 0.0490 Epoch: 6 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:09:36,594-Speed 3371.62 samples/sec Loss 4.7698 LearningRate 0.0490 Epoch: 6 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:09:39,622-Speed 3382.96 samples/sec Loss 4.7712 LearningRate 0.0490 Epoch: 6 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:09:42,699-Speed 3329.00 samples/sec Loss 4.6857 LearningRate 0.0490 Epoch: 6 Global Step: 74600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:09:45,691-Speed 3423.78 samples/sec Loss 4.7413 LearningRate 0.0490 Epoch: 6 Global Step: 74610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:09:48,785-Speed 3310.22 samples/sec Loss 4.7189 LearningRate 0.0489 Epoch: 6 Global Step: 74620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:09:51,829-Speed 3365.85 samples/sec Loss 4.7313 LearningRate 0.0489 Epoch: 6 Global Step: 74630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:09:54,934-Speed 3298.97 samples/sec Loss 4.8303 LearningRate 0.0489 Epoch: 6 Global Step: 74640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:09:57,982-Speed 3361.04 samples/sec Loss 4.7515 LearningRate 0.0489 Epoch: 6 Global Step: 74650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:01,055-Speed 3333.47 samples/sec Loss 4.7981 LearningRate 0.0489 Epoch: 6 Global Step: 74660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:04,142-Speed 3317.56 samples/sec Loss 4.9135 LearningRate 0.0489 Epoch: 6 Global Step: 74670 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:07,221-Speed 3327.55 samples/sec Loss 4.8174 LearningRate 0.0489 Epoch: 6 Global Step: 74680 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:10,257-Speed 3373.50 samples/sec Loss 4.8762 LearningRate 0.0489 Epoch: 6 Global Step: 74690 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:13,299-Speed 3367.40 samples/sec Loss 4.8226 LearningRate 0.0489 Epoch: 6 Global Step: 74700 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:16,329-Speed 3380.15 samples/sec Loss 4.7504 LearningRate 0.0489 Epoch: 6 Global Step: 74710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:10:19,811-Speed 2942.16 samples/sec Loss 4.8493 LearningRate 0.0489 Epoch: 6 Global Step: 74720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:10:22,872-Speed 3345.82 samples/sec Loss 4.8081 LearningRate 0.0489 Epoch: 6 Global Step: 74730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:10:25,916-Speed 3364.40 samples/sec Loss 4.7263 LearningRate 0.0489 Epoch: 6 Global Step: 74740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:10:28,995-Speed 3327.05 samples/sec Loss 4.9118 LearningRate 0.0489 Epoch: 6 Global Step: 74750 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:32,032-Speed 3373.89 samples/sec Loss 4.8410 LearningRate 0.0489 Epoch: 6 Global Step: 74760 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:35,072-Speed 3369.67 samples/sec Loss 4.7766 LearningRate 0.0489 Epoch: 6 Global Step: 74770 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:38,171-Speed 3304.81 samples/sec Loss 4.8567 LearningRate 0.0489 Epoch: 6 Global Step: 74780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:41,259-Speed 3317.48 samples/sec Loss 4.8317 LearningRate 0.0489 Epoch: 6 Global Step: 74790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:44,346-Speed 3318.18 samples/sec Loss 4.7810 LearningRate 0.0488 Epoch: 6 Global Step: 74800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:47,523-Speed 3223.63 samples/sec Loss 4.8855 LearningRate 0.0488 Epoch: 6 Global Step: 74810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:50,600-Speed 3329.00 samples/sec Loss 4.9118 LearningRate 0.0488 Epoch: 6 Global Step: 74820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:53,700-Speed 3304.00 samples/sec Loss 4.8462 LearningRate 0.0488 Epoch: 6 Global Step: 74830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:10:57,480-Speed 2709.87 samples/sec Loss 4.8710 LearningRate 0.0488 Epoch: 6 Global Step: 74840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:11:00,503-Speed 3389.05 samples/sec Loss 5.0124 LearningRate 0.0488 Epoch: 6 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:11:03,566-Speed 3343.43 samples/sec Loss 4.8551 LearningRate 0.0488 Epoch: 6 Global Step: 74860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:11:06,633-Speed 3340.93 samples/sec Loss 4.8696 LearningRate 0.0488 Epoch: 6 Global Step: 74870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:11:09,721-Speed 3316.74 samples/sec Loss 4.8790 LearningRate 0.0488 Epoch: 6 Global Step: 74880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:11:12,769-Speed 3360.48 samples/sec Loss 4.8224 LearningRate 0.0488 Epoch: 6 Global Step: 74890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:11:15,861-Speed 3312.67 samples/sec Loss 4.8497 LearningRate 0.0488 Epoch: 6 Global Step: 74900 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:11:18,934-Speed 3332.95 samples/sec Loss 4.8016 LearningRate 0.0488 Epoch: 6 Global Step: 74910 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:11:21,967-Speed 3377.38 samples/sec Loss 4.8402 LearningRate 0.0488 Epoch: 6 Global Step: 74920 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:11:24,994-Speed 3384.70 samples/sec Loss 4.9777 LearningRate 0.0488 Epoch: 6 Global Step: 74930 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:11:28,117-Speed 3279.26 samples/sec Loss 4.9048 LearningRate 0.0488 Epoch: 6 Global Step: 74940 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:11:31,139-Speed 3390.02 samples/sec Loss 4.8849 LearningRate 0.0488 Epoch: 6 Global Step: 74950 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:11:34,214-Speed 3330.53 samples/sec Loss 4.7797 LearningRate 0.0488 Epoch: 6 Global Step: 74960 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:11:37,227-Speed 3400.75 samples/sec Loss 4.9418 LearningRate 0.0488 Epoch: 6 Global Step: 74970 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:11:40,344-Speed 3285.64 samples/sec Loss 4.9276 LearningRate 0.0487 Epoch: 6 Global Step: 74980 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:11:43,479-Speed 3266.95 samples/sec Loss 4.9339 LearningRate 0.0487 Epoch: 6 Global Step: 74990 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:11:46,582-Speed 3301.13 samples/sec Loss 4.9476 LearningRate 0.0487 Epoch: 6 Global Step: 75000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:11:49,625-Speed 3366.29 samples/sec Loss 4.9324 LearningRate 0.0487 Epoch: 6 Global Step: 75010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:11:52,764-Speed 3262.98 samples/sec Loss 4.9094 LearningRate 0.0487 Epoch: 6 Global Step: 75020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:11:55,832-Speed 3339.68 samples/sec Loss 4.9472 LearningRate 0.0487 Epoch: 6 Global Step: 75030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:11:58,887-Speed 3352.46 samples/sec Loss 4.9657 LearningRate 0.0487 Epoch: 6 Global Step: 75040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:12:02,002-Speed 3288.34 samples/sec Loss 4.9753 LearningRate 0.0487 Epoch: 6 Global Step: 75050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:12:05,085-Speed 3322.76 samples/sec Loss 4.8835 LearningRate 0.0487 Epoch: 6 Global Step: 75060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:12:08,125-Speed 3369.96 samples/sec Loss 4.9279 LearningRate 0.0487 Epoch: 6 Global Step: 75070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:12:11,148-Speed 3387.83 samples/sec Loss 4.9680 LearningRate 0.0487 Epoch: 6 Global Step: 75080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:12:14,245-Speed 3307.60 samples/sec Loss 4.8848 LearningRate 0.0487 Epoch: 6 Global Step: 75090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:12:17,395-Speed 3252.26 samples/sec Loss 4.9831 LearningRate 0.0487 Epoch: 6 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:20,440-Speed 3363.93 samples/sec Loss 4.9848 LearningRate 0.0487 Epoch: 6 Global Step: 75110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:23,446-Speed 3407.55 samples/sec Loss 4.9771 LearningRate 0.0487 Epoch: 6 Global Step: 75120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:26,490-Speed 3365.00 samples/sec Loss 4.9712 LearningRate 0.0487 Epoch: 6 Global Step: 75130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:29,556-Speed 3340.42 samples/sec Loss 4.9268 LearningRate 0.0487 Epoch: 6 Global Step: 75140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:32,593-Speed 3373.72 samples/sec Loss 5.0148 LearningRate 0.0486 Epoch: 6 Global Step: 75150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:35,652-Speed 3348.72 samples/sec Loss 4.9900 LearningRate 0.0486 Epoch: 6 Global Step: 75160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:38,708-Speed 3351.87 samples/sec Loss 4.9824 LearningRate 0.0486 Epoch: 6 Global Step: 75170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:41,774-Speed 3340.15 samples/sec Loss 4.8776 LearningRate 0.0486 Epoch: 6 Global Step: 75180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:44,859-Speed 3321.38 samples/sec Loss 5.1187 LearningRate 0.0486 Epoch: 6 Global Step: 75190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:47,912-Speed 3354.87 samples/sec Loss 4.9585 LearningRate 0.0486 Epoch: 6 Global Step: 75200 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 08:12:50,922-Speed 3402.62 samples/sec Loss 4.9637 LearningRate 0.0486 Epoch: 6 Global Step: 75210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:53,967-Speed 3364.91 samples/sec Loss 4.9949 LearningRate 0.0486 Epoch: 6 Global Step: 75220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:12:57,004-Speed 3372.43 samples/sec Loss 5.0211 LearningRate 0.0486 Epoch: 6 Global Step: 75230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:13:00,023-Speed 3392.35 samples/sec Loss 5.0195 LearningRate 0.0486 Epoch: 6 Global Step: 75240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:13:03,097-Speed 3332.53 samples/sec Loss 5.0141 LearningRate 0.0486 Epoch: 6 Global Step: 75250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:13:06,133-Speed 3374.18 samples/sec Loss 5.0119 LearningRate 0.0486 Epoch: 6 Global Step: 75260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:13:09,143-Speed 3402.59 samples/sec Loss 4.9926 LearningRate 0.0486 Epoch: 6 Global Step: 75270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:13:12,174-Speed 3379.49 samples/sec Loss 4.9861 LearningRate 0.0486 Epoch: 6 Global Step: 75280 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:13:15,252-Speed 3328.53 samples/sec Loss 5.0014 LearningRate 0.0486 Epoch: 6 Global Step: 75290 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:13:18,280-Speed 3381.86 samples/sec Loss 4.9913 LearningRate 0.0486 Epoch: 6 Global Step: 75300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:13:21,300-Speed 3392.10 samples/sec Loss 4.9769 LearningRate 0.0486 Epoch: 6 Global Step: 75310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:13:24,336-Speed 3374.62 samples/sec Loss 5.0233 LearningRate 0.0486 Epoch: 6 Global Step: 75320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:13:27,499-Speed 3238.27 samples/sec Loss 5.0693 LearningRate 0.0485 Epoch: 6 Global Step: 75330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:13:30,539-Speed 3369.96 samples/sec Loss 4.9641 LearningRate 0.0485 Epoch: 6 Global Step: 75340 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:13:33,557-Speed 3393.31 samples/sec Loss 5.0859 LearningRate 0.0485 Epoch: 6 Global Step: 75350 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:13:36,602-Speed 3364.67 samples/sec Loss 5.0762 LearningRate 0.0485 Epoch: 6 Global Step: 75360 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:13:39,635-Speed 3377.60 samples/sec Loss 5.0289 LearningRate 0.0485 Epoch: 6 Global Step: 75370 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:13:42,713-Speed 3328.11 samples/sec Loss 4.9976 LearningRate 0.0485 Epoch: 6 Global Step: 75380 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:13:45,734-Speed 3391.24 samples/sec Loss 5.1118 LearningRate 0.0485 Epoch: 6 Global Step: 75390 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:13:48,874-Speed 3261.50 samples/sec Loss 5.0239 LearningRate 0.0485 Epoch: 6 Global Step: 75400 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:13:51,930-Speed 3351.93 samples/sec Loss 5.0028 LearningRate 0.0485 Epoch: 6 Global Step: 75410 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:13:54,980-Speed 3358.45 samples/sec Loss 5.0882 LearningRate 0.0485 Epoch: 6 Global Step: 75420 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:13:58,009-Speed 3382.54 samples/sec Loss 5.0234 LearningRate 0.0485 Epoch: 6 Global Step: 75430 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:14:01,072-Speed 3344.39 samples/sec Loss 5.0254 LearningRate 0.0485 Epoch: 6 Global Step: 75440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:04,125-Speed 3354.56 samples/sec Loss 5.0234 LearningRate 0.0485 Epoch: 6 Global Step: 75450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:07,209-Speed 3321.26 samples/sec Loss 4.9764 LearningRate 0.0485 Epoch: 6 Global Step: 75460 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:10,239-Speed 3380.42 samples/sec Loss 5.0700 LearningRate 0.0485 Epoch: 6 Global Step: 75470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:13,308-Speed 3338.84 samples/sec Loss 5.0868 LearningRate 0.0485 Epoch: 6 Global Step: 75480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:16,334-Speed 3384.53 samples/sec Loss 4.9865 LearningRate 0.0485 Epoch: 6 Global Step: 75490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:19,359-Speed 3386.59 samples/sec Loss 5.0977 LearningRate 0.0485 Epoch: 6 Global Step: 75500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:22,456-Speed 3306.71 samples/sec Loss 5.0956 LearningRate 0.0484 Epoch: 6 Global Step: 75510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:25,495-Speed 3371.44 samples/sec Loss 5.0326 LearningRate 0.0484 Epoch: 6 Global Step: 75520 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:28,570-Speed 3330.98 samples/sec Loss 5.0850 LearningRate 0.0484 Epoch: 6 Global Step: 75530 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:31,692-Speed 3280.82 samples/sec Loss 5.1611 LearningRate 0.0484 Epoch: 6 Global Step: 75540 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:34,705-Speed 3399.68 samples/sec Loss 5.0852 LearningRate 0.0484 Epoch: 6 Global Step: 75550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:37,793-Speed 3317.06 samples/sec Loss 5.1250 LearningRate 0.0484 Epoch: 6 Global Step: 75560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:40,925-Speed 3270.25 samples/sec Loss 5.1669 LearningRate 0.0484 Epoch: 6 Global Step: 75570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:43,994-Speed 3337.97 samples/sec Loss 5.0753 LearningRate 0.0484 Epoch: 6 Global Step: 75580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:47,022-Speed 3382.53 samples/sec Loss 5.0601 LearningRate 0.0484 Epoch: 6 Global Step: 75590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:50,124-Speed 3302.14 samples/sec Loss 5.1475 LearningRate 0.0484 Epoch: 6 Global Step: 75600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:53,232-Speed 3296.36 samples/sec Loss 5.1892 LearningRate 0.0484 Epoch: 6 Global Step: 75610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:56,294-Speed 3345.37 samples/sec Loss 5.0592 LearningRate 0.0484 Epoch: 6 Global Step: 75620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:14:59,400-Speed 3297.40 samples/sec Loss 5.2112 LearningRate 0.0484 Epoch: 6 Global Step: 75630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:15:02,458-Speed 3350.10 samples/sec Loss 5.1221 LearningRate 0.0484 Epoch: 6 Global Step: 75640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:15:05,582-Speed 3278.55 samples/sec Loss 5.2115 LearningRate 0.0484 Epoch: 6 Global Step: 75650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:15:08,633-Speed 3357.14 samples/sec Loss 5.1072 LearningRate 0.0484 Epoch: 6 Global Step: 75660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:15:11,693-Speed 3348.17 samples/sec Loss 5.1723 LearningRate 0.0484 Epoch: 6 Global Step: 75670 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:15:14,846-Speed 3248.36 samples/sec Loss 5.1980 LearningRate 0.0484 Epoch: 6 Global Step: 75680 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:15:17,998-Speed 3249.71 samples/sec Loss 5.2321 LearningRate 0.0483 Epoch: 6 Global Step: 75690 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:15:21,040-Speed 3367.24 samples/sec Loss 5.2417 LearningRate 0.0483 Epoch: 6 Global Step: 75700 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:15:24,146-Speed 3297.66 samples/sec Loss 5.1490 LearningRate 0.0483 Epoch: 6 Global Step: 75710 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:15:27,199-Speed 3355.63 samples/sec Loss 5.1945 LearningRate 0.0483 Epoch: 6 Global Step: 75720 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:15:30,285-Speed 3318.63 samples/sec Loss 5.0760 LearningRate 0.0483 Epoch: 6 Global Step: 75730 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:15:33,313-Speed 3383.63 samples/sec Loss 5.1508 LearningRate 0.0483 Epoch: 6 Global Step: 75740 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:15:36,339-Speed 3385.10 samples/sec Loss 5.1207 LearningRate 0.0483 Epoch: 6 Global Step: 75750 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:15:39,384-Speed 3363.68 samples/sec Loss 5.1998 LearningRate 0.0483 Epoch: 6 Global Step: 75760 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:15:42,470-Speed 3319.47 samples/sec Loss 5.1357 LearningRate 0.0483 Epoch: 6 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:15:45,532-Speed 3344.86 samples/sec Loss 5.1718 LearningRate 0.0483 Epoch: 6 Global Step: 75780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:15:48,585-Speed 3355.91 samples/sec Loss 5.2406 LearningRate 0.0483 Epoch: 6 Global Step: 75790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:15:51,637-Speed 3355.60 samples/sec Loss 5.2187 LearningRate 0.0483 Epoch: 6 Global Step: 75800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:15:54,714-Speed 3329.63 samples/sec Loss 5.1670 LearningRate 0.0483 Epoch: 6 Global Step: 75810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:15:57,750-Speed 3373.02 samples/sec Loss 5.2259 LearningRate 0.0483 Epoch: 6 Global Step: 75820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:16:00,806-Speed 3351.65 samples/sec Loss 5.1339 LearningRate 0.0483 Epoch: 6 Global Step: 75830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:16:03,854-Speed 3360.34 samples/sec Loss 5.1925 LearningRate 0.0483 Epoch: 6 Global Step: 75840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:16:06,960-Speed 3298.91 samples/sec Loss 5.1500 LearningRate 0.0483 Epoch: 6 Global Step: 75850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:16:10,015-Speed 3352.80 samples/sec Loss 5.1809 LearningRate 0.0483 Epoch: 6 Global Step: 75860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:16:13,062-Speed 3361.85 samples/sec Loss 5.1658 LearningRate 0.0482 Epoch: 6 Global Step: 75870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 08:16:16,164-Speed 3302.34 samples/sec Loss 5.2730 LearningRate 0.0482 Epoch: 6 Global Step: 75880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 08:16:19,228-Speed 3343.76 samples/sec Loss 5.1876 LearningRate 0.0482 Epoch: 6 Global Step: 75890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:16:22,291-Speed 3343.94 samples/sec Loss 5.1758 LearningRate 0.0482 Epoch: 6 Global Step: 75900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:16:25,383-Speed 3312.69 samples/sec Loss 5.2691 LearningRate 0.0482 Epoch: 6 Global Step: 75910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:16:28,454-Speed 3335.72 samples/sec Loss 5.1382 LearningRate 0.0482 Epoch: 6 Global Step: 75920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:16:31,539-Speed 3320.82 samples/sec Loss 5.0889 LearningRate 0.0482 Epoch: 6 Global Step: 75930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:16:34,585-Speed 3362.36 samples/sec Loss 5.1764 LearningRate 0.0482 Epoch: 6 Global Step: 75940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:16:37,640-Speed 3353.57 samples/sec Loss 5.0488 LearningRate 0.0482 Epoch: 6 Global Step: 75950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:16:40,756-Speed 3286.35 samples/sec Loss 5.1298 LearningRate 0.0482 Epoch: 6 Global Step: 75960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:16:43,867-Speed 3292.49 samples/sec Loss 5.1748 LearningRate 0.0482 Epoch: 6 Global Step: 75970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:16:46,978-Speed 3293.57 samples/sec Loss 5.2369 LearningRate 0.0482 Epoch: 6 Global Step: 75980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:16:49,998-Speed 3391.55 samples/sec Loss 5.1949 LearningRate 0.0482 Epoch: 6 Global Step: 75990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:16:53,062-Speed 3342.93 samples/sec Loss 5.2824 LearningRate 0.0482 Epoch: 6 Global Step: 76000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:16:56,099-Speed 3372.99 samples/sec Loss 5.2904 LearningRate 0.0482 Epoch: 6 Global Step: 76010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:16:59,137-Speed 3371.18 samples/sec Loss 5.2340 LearningRate 0.0482 Epoch: 6 Global Step: 76020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:17:02,186-Speed 3360.35 samples/sec Loss 5.2043 LearningRate 0.0482 Epoch: 6 Global Step: 76030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:17:05,294-Speed 3295.61 samples/sec Loss 5.2266 LearningRate 0.0482 Epoch: 6 Global Step: 76040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:17:08,343-Speed 3358.78 samples/sec Loss 5.2248 LearningRate 0.0481 Epoch: 6 Global Step: 76050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:17:11,371-Speed 3383.52 samples/sec Loss 5.2359 LearningRate 0.0481 Epoch: 6 Global Step: 76060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:17:14,434-Speed 3343.47 samples/sec Loss 5.1466 LearningRate 0.0481 Epoch: 6 Global Step: 76070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:17:17,506-Speed 3334.56 samples/sec Loss 5.1356 LearningRate 0.0481 Epoch: 6 Global Step: 76080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:17:20,580-Speed 3333.11 samples/sec Loss 5.2995 LearningRate 0.0481 Epoch: 6 Global Step: 76090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:17:23,643-Speed 3343.42 samples/sec Loss 5.2193 LearningRate 0.0481 Epoch: 6 Global Step: 76100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:17:26,724-Speed 3325.23 samples/sec Loss 5.2841 LearningRate 0.0481 Epoch: 6 Global Step: 76110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:17:29,818-Speed 3310.36 samples/sec Loss 5.3471 LearningRate 0.0481 Epoch: 6 Global Step: 76120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:17:32,883-Speed 3342.61 samples/sec Loss 5.1891 LearningRate 0.0481 Epoch: 6 Global Step: 76130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:17:35,926-Speed 3366.20 samples/sec Loss 5.2335 LearningRate 0.0481 Epoch: 6 Global Step: 76140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:17:39,049-Speed 3279.77 samples/sec Loss 5.1403 LearningRate 0.0481 Epoch: 6 Global Step: 76150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:17:42,165-Speed 3286.72 samples/sec Loss 5.2880 LearningRate 0.0481 Epoch: 6 Global Step: 76160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:17:45,240-Speed 3332.33 samples/sec Loss 5.3248 LearningRate 0.0481 Epoch: 6 Global Step: 76170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:17:48,328-Speed 3316.16 samples/sec Loss 5.2287 LearningRate 0.0481 Epoch: 6 Global Step: 76180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:17:51,421-Speed 3311.84 samples/sec Loss 5.2686 LearningRate 0.0481 Epoch: 6 Global Step: 76190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:17:54,477-Speed 3352.39 samples/sec Loss 5.1943 LearningRate 0.0481 Epoch: 6 Global Step: 76200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:17:57,546-Speed 3338.08 samples/sec Loss 5.2356 LearningRate 0.0481 Epoch: 6 Global Step: 76210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:00,607-Speed 3345.92 samples/sec Loss 5.1522 LearningRate 0.0480 Epoch: 6 Global Step: 76220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:03,639-Speed 3378.41 samples/sec Loss 5.2266 LearningRate 0.0480 Epoch: 6 Global Step: 76230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:06,766-Speed 3275.80 samples/sec Loss 5.2503 LearningRate 0.0480 Epoch: 6 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:09,800-Speed 3375.99 samples/sec Loss 5.2485 LearningRate 0.0480 Epoch: 6 Global Step: 76250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 08:18:12,877-Speed 3329.49 samples/sec Loss 5.2665 LearningRate 0.0480 Epoch: 6 Global Step: 76260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:15,932-Speed 3352.32 samples/sec Loss 5.2763 LearningRate 0.0480 Epoch: 6 Global Step: 76270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:18,983-Speed 3357.12 samples/sec Loss 5.2776 LearningRate 0.0480 Epoch: 6 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:22,024-Speed 3368.77 samples/sec Loss 5.4557 LearningRate 0.0480 Epoch: 6 Global Step: 76290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:25,078-Speed 3355.21 samples/sec Loss 5.2734 LearningRate 0.0480 Epoch: 6 Global Step: 76300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:28,214-Speed 3266.04 samples/sec Loss 5.3511 LearningRate 0.0480 Epoch: 6 Global Step: 76310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:31,362-Speed 3254.94 samples/sec Loss 5.2537 LearningRate 0.0480 Epoch: 6 Global Step: 76320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:34,398-Speed 3373.64 samples/sec Loss 5.2358 LearningRate 0.0480 Epoch: 6 Global Step: 76330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:37,482-Speed 3321.07 samples/sec Loss 5.3362 LearningRate 0.0480 Epoch: 6 Global Step: 76340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:40,594-Speed 3291.48 samples/sec Loss 5.3537 LearningRate 0.0480 Epoch: 6 Global Step: 76350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:43,638-Speed 3365.35 samples/sec Loss 5.2717 LearningRate 0.0480 Epoch: 6 Global Step: 76360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 08:18:46,674-Speed 3374.36 samples/sec Loss 5.3335 LearningRate 0.0480 Epoch: 6 Global Step: 76370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 08:18:49,797-Speed 3279.93 samples/sec Loss 5.3273 LearningRate 0.0480 Epoch: 6 Global Step: 76380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:52,877-Speed 3324.91 samples/sec Loss 5.2664 LearningRate 0.0480 Epoch: 6 Global Step: 76390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:55,921-Speed 3365.41 samples/sec Loss 5.3513 LearningRate 0.0479 Epoch: 6 Global Step: 76400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:18:59,045-Speed 3278.48 samples/sec Loss 5.3969 LearningRate 0.0479 Epoch: 6 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:19:02,073-Speed 3382.77 samples/sec Loss 5.3042 LearningRate 0.0479 Epoch: 6 Global Step: 76420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:05,184-Speed 3293.59 samples/sec Loss 5.3133 LearningRate 0.0479 Epoch: 6 Global Step: 76430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:08,278-Speed 3309.74 samples/sec Loss 5.3582 LearningRate 0.0479 Epoch: 6 Global Step: 76440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:11,341-Speed 3344.76 samples/sec Loss 5.2769 LearningRate 0.0479 Epoch: 6 Global Step: 76450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:14,435-Speed 3310.47 samples/sec Loss 5.3687 LearningRate 0.0479 Epoch: 6 Global Step: 76460 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:17,515-Speed 3325.75 samples/sec Loss 5.3558 LearningRate 0.0479 Epoch: 6 Global Step: 76470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:20,551-Speed 3373.57 samples/sec Loss 5.3185 LearningRate 0.0479 Epoch: 6 Global Step: 76480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:23,734-Speed 3218.20 samples/sec Loss 5.3610 LearningRate 0.0479 Epoch: 6 Global Step: 76490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:26,887-Speed 3249.63 samples/sec Loss 5.2880 LearningRate 0.0479 Epoch: 6 Global Step: 76500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:30,045-Speed 3243.33 samples/sec Loss 5.3829 LearningRate 0.0479 Epoch: 6 Global Step: 76510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:33,119-Speed 3332.43 samples/sec Loss 5.3390 LearningRate 0.0479 Epoch: 6 Global Step: 76520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:19:36,203-Speed 3321.43 samples/sec Loss 5.2862 LearningRate 0.0479 Epoch: 6 Global Step: 76530 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:39,320-Speed 3286.49 samples/sec Loss 5.4550 LearningRate 0.0479 Epoch: 6 Global Step: 76540 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:42,377-Speed 3350.84 samples/sec Loss 5.3905 LearningRate 0.0479 Epoch: 6 Global Step: 76550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:45,440-Speed 3344.12 samples/sec Loss 5.4095 LearningRate 0.0479 Epoch: 6 Global Step: 76560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:48,527-Speed 3318.86 samples/sec Loss 5.4039 LearningRate 0.0479 Epoch: 6 Global Step: 76570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:51,636-Speed 3294.72 samples/sec Loss 5.3502 LearningRate 0.0478 Epoch: 6 Global Step: 76580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:54,697-Speed 3346.13 samples/sec Loss 5.3959 LearningRate 0.0478 Epoch: 6 Global Step: 76590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:19:57,732-Speed 3375.53 samples/sec Loss 5.4502 LearningRate 0.0478 Epoch: 6 Global Step: 76600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:20:00,804-Speed 3333.85 samples/sec Loss 5.3998 LearningRate 0.0478 Epoch: 6 Global Step: 76610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:20:03,921-Speed 3286.50 samples/sec Loss 5.3547 LearningRate 0.0478 Epoch: 6 Global Step: 76620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:20:07,018-Speed 3307.75 samples/sec Loss 5.3750 LearningRate 0.0478 Epoch: 6 Global Step: 76630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:10,103-Speed 3320.66 samples/sec Loss 5.4835 LearningRate 0.0478 Epoch: 6 Global Step: 76640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:13,192-Speed 3316.08 samples/sec Loss 5.3442 LearningRate 0.0478 Epoch: 6 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:16,297-Speed 3298.95 samples/sec Loss 5.3424 LearningRate 0.0478 Epoch: 6 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:19,391-Speed 3310.69 samples/sec Loss 5.3876 LearningRate 0.0478 Epoch: 6 Global Step: 76670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:22,449-Speed 3349.89 samples/sec Loss 5.3281 LearningRate 0.0478 Epoch: 6 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:25,613-Speed 3237.72 samples/sec Loss 5.4168 LearningRate 0.0478 Epoch: 6 Global Step: 76690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:28,725-Speed 3291.28 samples/sec Loss 5.2822 LearningRate 0.0478 Epoch: 6 Global Step: 76700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:31,815-Speed 3315.73 samples/sec Loss 5.3335 LearningRate 0.0478 Epoch: 6 Global Step: 76710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:34,876-Speed 3345.59 samples/sec Loss 5.2969 LearningRate 0.0478 Epoch: 6 Global Step: 76720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:38,050-Speed 3227.28 samples/sec Loss 5.3501 LearningRate 0.0478 Epoch: 6 Global Step: 76730 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 08:20:41,152-Speed 3303.17 samples/sec Loss 5.4308 LearningRate 0.0478 Epoch: 6 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:44,210-Speed 3349.76 samples/sec Loss 5.3273 LearningRate 0.0478 Epoch: 6 Global Step: 76750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:47,295-Speed 3319.69 samples/sec Loss 5.4312 LearningRate 0.0477 Epoch: 6 Global Step: 76760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:50,447-Speed 3250.56 samples/sec Loss 5.3119 LearningRate 0.0477 Epoch: 6 Global Step: 76770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:53,565-Speed 3284.73 samples/sec Loss 5.4035 LearningRate 0.0477 Epoch: 6 Global Step: 76780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:56,649-Speed 3321.80 samples/sec Loss 5.2767 LearningRate 0.0477 Epoch: 6 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:20:59,744-Speed 3309.51 samples/sec Loss 5.3539 LearningRate 0.0477 Epoch: 6 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:02,836-Speed 3312.45 samples/sec Loss 5.2793 LearningRate 0.0477 Epoch: 6 Global Step: 76810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:05,898-Speed 3345.47 samples/sec Loss 5.3194 LearningRate 0.0477 Epoch: 6 Global Step: 76820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:08,955-Speed 3350.78 samples/sec Loss 5.3087 LearningRate 0.0477 Epoch: 6 Global Step: 76830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:12,035-Speed 3325.59 samples/sec Loss 5.2792 LearningRate 0.0477 Epoch: 6 Global Step: 76840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:15,124-Speed 3316.64 samples/sec Loss 5.3557 LearningRate 0.0477 Epoch: 6 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:18,264-Speed 3262.23 samples/sec Loss 5.4066 LearningRate 0.0477 Epoch: 6 Global Step: 76860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:21,323-Speed 3348.58 samples/sec Loss 5.4089 LearningRate 0.0477 Epoch: 6 Global Step: 76870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:24,409-Speed 3319.40 samples/sec Loss 5.3958 LearningRate 0.0477 Epoch: 6 Global Step: 76880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:27,500-Speed 3313.87 samples/sec Loss 5.3874 LearningRate 0.0477 Epoch: 6 Global Step: 76890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:30,587-Speed 3318.18 samples/sec Loss 5.4487 LearningRate 0.0477 Epoch: 6 Global Step: 76900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:33,642-Speed 3352.62 samples/sec Loss 5.3655 LearningRate 0.0477 Epoch: 6 Global Step: 76910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:21:36,706-Speed 3343.23 samples/sec Loss 5.3529 LearningRate 0.0477 Epoch: 6 Global Step: 76920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:21:39,788-Speed 3324.15 samples/sec Loss 5.3577 LearningRate 0.0477 Epoch: 6 Global Step: 76930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:21:42,890-Speed 3302.29 samples/sec Loss 5.3680 LearningRate 0.0476 Epoch: 6 Global Step: 76940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:21:45,946-Speed 3351.35 samples/sec Loss 5.4940 LearningRate 0.0476 Epoch: 6 Global Step: 76950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:21:49,060-Speed 3289.76 samples/sec Loss 5.3652 LearningRate 0.0476 Epoch: 6 Global Step: 76960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:21:52,132-Speed 3334.80 samples/sec Loss 5.3834 LearningRate 0.0476 Epoch: 6 Global Step: 76970 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:21:55,298-Speed 3234.85 samples/sec Loss 5.5264 LearningRate 0.0476 Epoch: 6 Global Step: 76980 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:21:58,326-Speed 3383.37 samples/sec Loss 5.4155 LearningRate 0.0476 Epoch: 6 Global Step: 76990 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:22:01,395-Speed 3338.30 samples/sec Loss 5.4310 LearningRate 0.0476 Epoch: 6 Global Step: 77000 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:22:04,543-Speed 3253.64 samples/sec Loss 5.3508 LearningRate 0.0476 Epoch: 6 Global Step: 77010 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:22:07,671-Speed 3274.35 samples/sec Loss 5.4325 LearningRate 0.0476 Epoch: 6 Global Step: 77020 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:22:10,721-Speed 3358.42 samples/sec Loss 5.4194 LearningRate 0.0476 Epoch: 6 Global Step: 77030 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:22:13,847-Speed 3277.47 samples/sec Loss 5.3813 LearningRate 0.0476 Epoch: 6 Global Step: 77040 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:22:16,977-Speed 3271.91 samples/sec Loss 5.4467 LearningRate 0.0476 Epoch: 6 Global Step: 77050 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:22:20,076-Speed 3305.94 samples/sec Loss 5.3650 LearningRate 0.0476 Epoch: 6 Global Step: 77060 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:22:23,164-Speed 3317.33 samples/sec Loss 5.5105 LearningRate 0.0476 Epoch: 6 Global Step: 77070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:22:26,243-Speed 3326.68 samples/sec Loss 5.3321 LearningRate 0.0476 Epoch: 6 Global Step: 77080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:22:29,332-Speed 3315.96 samples/sec Loss 5.3975 LearningRate 0.0476 Epoch: 6 Global Step: 77090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:22:32,385-Speed 3354.88 samples/sec Loss 5.4762 LearningRate 0.0476 Epoch: 6 Global Step: 77100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:22:35,473-Speed 3317.34 samples/sec Loss 5.3388 LearningRate 0.0476 Epoch: 6 Global Step: 77110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:22:38,606-Speed 3269.69 samples/sec Loss 5.5537 LearningRate 0.0475 Epoch: 6 Global Step: 77120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:22:41,730-Speed 3279.19 samples/sec Loss 5.3971 LearningRate 0.0475 Epoch: 6 Global Step: 77130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:22:44,789-Speed 3348.71 samples/sec Loss 5.5282 LearningRate 0.0475 Epoch: 6 Global Step: 77140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:22:47,895-Speed 3297.39 samples/sec Loss 5.4556 LearningRate 0.0475 Epoch: 6 Global Step: 77150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:22:50,974-Speed 3327.21 samples/sec Loss 5.4759 LearningRate 0.0475 Epoch: 6 Global Step: 77160 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:22:54,066-Speed 3313.21 samples/sec Loss 5.4843 LearningRate 0.0475 Epoch: 6 Global Step: 77170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:22:57,107-Speed 3368.08 samples/sec Loss 5.4190 LearningRate 0.0475 Epoch: 6 Global Step: 77180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:23:00,236-Speed 3274.73 samples/sec Loss 5.5282 LearningRate 0.0475 Epoch: 6 Global Step: 77190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:23:03,322-Speed 3319.23 samples/sec Loss 5.4699 LearningRate 0.0475 Epoch: 6 Global Step: 77200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:23:06,436-Speed 3289.47 samples/sec Loss 5.4941 LearningRate 0.0475 Epoch: 6 Global Step: 77210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:23:09,527-Speed 3313.33 samples/sec Loss 5.4576 LearningRate 0.0475 Epoch: 6 Global Step: 77220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:23:12,623-Speed 3309.21 samples/sec Loss 5.4832 LearningRate 0.0475 Epoch: 6 Global Step: 77230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:23:15,705-Speed 3323.65 samples/sec Loss 5.3385 LearningRate 0.0475 Epoch: 6 Global Step: 77240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:23:18,806-Speed 3302.83 samples/sec Loss 5.4002 LearningRate 0.0475 Epoch: 6 Global Step: 77250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:23:21,870-Speed 3344.07 samples/sec Loss 5.3775 LearningRate 0.0475 Epoch: 6 Global Step: 77260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:23:24,963-Speed 3311.10 samples/sec Loss 5.5144 LearningRate 0.0475 Epoch: 6 Global Step: 77270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:23:28,090-Speed 3276.09 samples/sec Loss 5.4621 LearningRate 0.0475 Epoch: 6 Global Step: 77280 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:23:31,176-Speed 3318.57 samples/sec Loss 5.3980 LearningRate 0.0475 Epoch: 6 Global Step: 77290 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:23:34,253-Speed 3329.41 samples/sec Loss 5.5243 LearningRate 0.0474 Epoch: 6 Global Step: 77300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:23:37,333-Speed 3326.12 samples/sec Loss 5.4815 LearningRate 0.0474 Epoch: 6 Global Step: 77310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:23:40,431-Speed 3306.04 samples/sec Loss 5.3886 LearningRate 0.0474 Epoch: 6 Global Step: 77320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:23:43,527-Speed 3308.87 samples/sec Loss 5.4316 LearningRate 0.0474 Epoch: 6 Global Step: 77330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:23:46,574-Speed 3361.35 samples/sec Loss 5.4769 LearningRate 0.0474 Epoch: 6 Global Step: 77340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:23:49,615-Speed 3368.45 samples/sec Loss 5.5475 LearningRate 0.0474 Epoch: 6 Global Step: 77350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:23:52,677-Speed 3344.95 samples/sec Loss 5.4513 LearningRate 0.0474 Epoch: 6 Global Step: 77360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:23:55,749-Speed 3335.04 samples/sec Loss 5.4129 LearningRate 0.0474 Epoch: 6 Global Step: 77370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:23:58,812-Speed 3343.73 samples/sec Loss 5.4373 LearningRate 0.0474 Epoch: 6 Global Step: 77380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:01,933-Speed 3282.40 samples/sec Loss 5.4257 LearningRate 0.0474 Epoch: 6 Global Step: 77390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:05,082-Speed 3252.37 samples/sec Loss 5.4721 LearningRate 0.0474 Epoch: 6 Global Step: 77400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:08,153-Speed 3335.74 samples/sec Loss 5.5921 LearningRate 0.0474 Epoch: 6 Global Step: 77410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:11,219-Speed 3341.17 samples/sec Loss 5.5352 LearningRate 0.0474 Epoch: 6 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:14,340-Speed 3281.93 samples/sec Loss 5.5432 LearningRate 0.0474 Epoch: 6 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:17,404-Speed 3343.54 samples/sec Loss 5.4314 LearningRate 0.0474 Epoch: 6 Global Step: 77440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:20,475-Speed 3335.72 samples/sec Loss 5.4492 LearningRate 0.0474 Epoch: 6 Global Step: 77450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:23,580-Speed 3298.69 samples/sec Loss 5.5936 LearningRate 0.0474 Epoch: 6 Global Step: 77460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:26,645-Speed 3341.32 samples/sec Loss 5.4344 LearningRate 0.0474 Epoch: 6 Global Step: 77470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:29,754-Speed 3294.67 samples/sec Loss 5.4787 LearningRate 0.0473 Epoch: 6 Global Step: 77480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:32,790-Speed 3374.59 samples/sec Loss 5.5633 LearningRate 0.0473 Epoch: 6 Global Step: 77490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:35,894-Speed 3299.84 samples/sec Loss 5.4381 LearningRate 0.0473 Epoch: 6 Global Step: 77500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:39,015-Speed 3282.56 samples/sec Loss 5.5224 LearningRate 0.0473 Epoch: 6 Global Step: 77510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:42,106-Speed 3313.18 samples/sec Loss 5.6054 LearningRate 0.0473 Epoch: 6 Global Step: 77520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:45,171-Speed 3342.23 samples/sec Loss 5.5270 LearningRate 0.0473 Epoch: 6 Global Step: 77530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:48,251-Speed 3326.37 samples/sec Loss 5.4757 LearningRate 0.0473 Epoch: 6 Global Step: 77540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:51,315-Speed 3342.82 samples/sec Loss 5.6294 LearningRate 0.0473 Epoch: 6 Global Step: 77550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:54,417-Speed 3302.22 samples/sec Loss 5.4601 LearningRate 0.0473 Epoch: 6 Global Step: 77560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:24:57,464-Speed 3361.80 samples/sec Loss 5.3458 LearningRate 0.0473 Epoch: 6 Global Step: 77570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:00,503-Speed 3370.18 samples/sec Loss 5.4251 LearningRate 0.0473 Epoch: 6 Global Step: 77580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:03,584-Speed 3324.50 samples/sec Loss 5.5015 LearningRate 0.0473 Epoch: 6 Global Step: 77590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:06,697-Speed 3291.49 samples/sec Loss 5.3798 LearningRate 0.0473 Epoch: 6 Global Step: 77600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:09,747-Speed 3357.72 samples/sec Loss 5.5380 LearningRate 0.0473 Epoch: 6 Global Step: 77610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:12,826-Speed 3327.01 samples/sec Loss 5.5250 LearningRate 0.0473 Epoch: 6 Global Step: 77620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:16,019-Speed 3208.10 samples/sec Loss 5.5410 LearningRate 0.0473 Epoch: 6 Global Step: 77630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:19,080-Speed 3347.35 samples/sec Loss 5.5828 LearningRate 0.0473 Epoch: 6 Global Step: 77640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:22,167-Speed 3317.78 samples/sec Loss 5.4179 LearningRate 0.0473 Epoch: 6 Global Step: 77650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:25,222-Speed 3353.08 samples/sec Loss 5.5501 LearningRate 0.0472 Epoch: 6 Global Step: 77660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:28,312-Speed 3314.52 samples/sec Loss 5.4523 LearningRate 0.0472 Epoch: 6 Global Step: 77670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:31,415-Speed 3301.83 samples/sec Loss 5.4447 LearningRate 0.0472 Epoch: 6 Global Step: 77680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:34,508-Speed 3311.26 samples/sec Loss 5.4936 LearningRate 0.0472 Epoch: 6 Global Step: 77690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:37,657-Speed 3252.76 samples/sec Loss 5.5209 LearningRate 0.0472 Epoch: 6 Global Step: 77700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:40,874-Speed 3184.36 samples/sec Loss 5.6091 LearningRate 0.0472 Epoch: 6 Global Step: 77710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:43,991-Speed 3285.21 samples/sec Loss 5.5608 LearningRate 0.0472 Epoch: 6 Global Step: 77720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:47,168-Speed 3224.03 samples/sec Loss 5.4946 LearningRate 0.0472 Epoch: 6 Global Step: 77730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:50,279-Speed 3292.81 samples/sec Loss 5.5674 LearningRate 0.0472 Epoch: 6 Global Step: 77740 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 08:25:53,401-Speed 3281.85 samples/sec Loss 5.5783 LearningRate 0.0472 Epoch: 6 Global Step: 77750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:56,472-Speed 3334.67 samples/sec Loss 5.6238 LearningRate 0.0472 Epoch: 6 Global Step: 77760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:25:59,546-Speed 3333.02 samples/sec Loss 5.5321 LearningRate 0.0472 Epoch: 6 Global Step: 77770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:02,614-Speed 3338.28 samples/sec Loss 5.5750 LearningRate 0.0472 Epoch: 6 Global Step: 77780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:05,676-Speed 3345.19 samples/sec Loss 5.5270 LearningRate 0.0472 Epoch: 6 Global Step: 77790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:08,737-Speed 3346.71 samples/sec Loss 5.5007 LearningRate 0.0472 Epoch: 6 Global Step: 77800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:11,830-Speed 3312.00 samples/sec Loss 5.5370 LearningRate 0.0472 Epoch: 6 Global Step: 77810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:15,052-Speed 3179.14 samples/sec Loss 5.5574 LearningRate 0.0472 Epoch: 6 Global Step: 77820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:18,144-Speed 3312.86 samples/sec Loss 5.5188 LearningRate 0.0472 Epoch: 6 Global Step: 77830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:21,200-Speed 3352.23 samples/sec Loss 5.5693 LearningRate 0.0472 Epoch: 6 Global Step: 77840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:24,263-Speed 3344.23 samples/sec Loss 5.5047 LearningRate 0.0471 Epoch: 6 Global Step: 77850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:27,337-Speed 3331.55 samples/sec Loss 5.4865 LearningRate 0.0471 Epoch: 6 Global Step: 77860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:30,510-Speed 3228.14 samples/sec Loss 5.4817 LearningRate 0.0471 Epoch: 6 Global Step: 77870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:33,609-Speed 3305.43 samples/sec Loss 5.5977 LearningRate 0.0471 Epoch: 6 Global Step: 77880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:36,767-Speed 3243.44 samples/sec Loss 5.6062 LearningRate 0.0471 Epoch: 6 Global Step: 77890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:39,869-Speed 3302.32 samples/sec Loss 5.5833 LearningRate 0.0471 Epoch: 6 Global Step: 77900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:43,020-Speed 3251.44 samples/sec Loss 5.5448 LearningRate 0.0471 Epoch: 6 Global Step: 77910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:46,083-Speed 3343.96 samples/sec Loss 5.5578 LearningRate 0.0471 Epoch: 6 Global Step: 77920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:49,195-Speed 3291.38 samples/sec Loss 5.5085 LearningRate 0.0471 Epoch: 6 Global Step: 77930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:52,316-Speed 3282.91 samples/sec Loss 5.4967 LearningRate 0.0471 Epoch: 6 Global Step: 77940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:55,433-Speed 3286.25 samples/sec Loss 5.6302 LearningRate 0.0471 Epoch: 6 Global Step: 77950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:26:58,549-Speed 3287.30 samples/sec Loss 5.5819 LearningRate 0.0471 Epoch: 6 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:27:01,648-Speed 3304.86 samples/sec Loss 5.4418 LearningRate 0.0471 Epoch: 6 Global Step: 77970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:04,758-Speed 3293.78 samples/sec Loss 5.6176 LearningRate 0.0471 Epoch: 6 Global Step: 77980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:07,855-Speed 3307.33 samples/sec Loss 5.5639 LearningRate 0.0471 Epoch: 6 Global Step: 77990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:10,913-Speed 3350.00 samples/sec Loss 5.5061 LearningRate 0.0471 Epoch: 6 Global Step: 78000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:14,074-Speed 3240.62 samples/sec Loss 5.5279 LearningRate 0.0471 Epoch: 6 Global Step: 78010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:17,168-Speed 3310.48 samples/sec Loss 5.5704 LearningRate 0.0471 Epoch: 6 Global Step: 78020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:20,255-Speed 3317.24 samples/sec Loss 5.6311 LearningRate 0.0470 Epoch: 6 Global Step: 78030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:23,396-Speed 3261.07 samples/sec Loss 5.5369 LearningRate 0.0470 Epoch: 6 Global Step: 78040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:26,533-Speed 3265.77 samples/sec Loss 5.6350 LearningRate 0.0470 Epoch: 6 Global Step: 78050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:29,658-Speed 3277.33 samples/sec Loss 5.4820 LearningRate 0.0470 Epoch: 6 Global Step: 78060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:32,734-Speed 3330.22 samples/sec Loss 5.5625 LearningRate 0.0470 Epoch: 6 Global Step: 78070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:27:35,877-Speed 3258.96 samples/sec Loss 5.4674 LearningRate 0.0470 Epoch: 6 Global Step: 78080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:27:39,017-Speed 3262.18 samples/sec Loss 5.4850 LearningRate 0.0470 Epoch: 6 Global Step: 78090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:42,106-Speed 3316.27 samples/sec Loss 5.5392 LearningRate 0.0470 Epoch: 6 Global Step: 78100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:45,153-Speed 3361.99 samples/sec Loss 5.5622 LearningRate 0.0470 Epoch: 6 Global Step: 78110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:48,305-Speed 3249.51 samples/sec Loss 5.5719 LearningRate 0.0470 Epoch: 6 Global Step: 78120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:51,459-Speed 3247.73 samples/sec Loss 5.5404 LearningRate 0.0470 Epoch: 6 Global Step: 78130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:54,550-Speed 3313.39 samples/sec Loss 5.5538 LearningRate 0.0470 Epoch: 6 Global Step: 78140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:27:57,581-Speed 3379.67 samples/sec Loss 5.4793 LearningRate 0.0470 Epoch: 6 Global Step: 78150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:00,635-Speed 3354.14 samples/sec Loss 5.6062 LearningRate 0.0470 Epoch: 6 Global Step: 78160 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:03,717-Speed 3324.08 samples/sec Loss 5.5881 LearningRate 0.0470 Epoch: 6 Global Step: 78170 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:06,811-Speed 3310.81 samples/sec Loss 5.6292 LearningRate 0.0470 Epoch: 6 Global Step: 78180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:09,940-Speed 3272.96 samples/sec Loss 5.6443 LearningRate 0.0470 Epoch: 6 Global Step: 78190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:28:13,016-Speed 3330.45 samples/sec Loss 5.6076 LearningRate 0.0470 Epoch: 6 Global Step: 78200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:16,161-Speed 3256.93 samples/sec Loss 5.6418 LearningRate 0.0469 Epoch: 6 Global Step: 78210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:19,287-Speed 3276.86 samples/sec Loss 5.5269 LearningRate 0.0469 Epoch: 6 Global Step: 78220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:22,351-Speed 3343.00 samples/sec Loss 5.6074 LearningRate 0.0469 Epoch: 6 Global Step: 78230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:25,471-Speed 3283.16 samples/sec Loss 5.5915 LearningRate 0.0469 Epoch: 6 Global Step: 78240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:28,558-Speed 3317.71 samples/sec Loss 5.5214 LearningRate 0.0469 Epoch: 6 Global Step: 78250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:31,704-Speed 3255.65 samples/sec Loss 5.6774 LearningRate 0.0469 Epoch: 6 Global Step: 78260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:34,803-Speed 3305.23 samples/sec Loss 5.5418 LearningRate 0.0469 Epoch: 6 Global Step: 78270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:37,966-Speed 3238.28 samples/sec Loss 5.6638 LearningRate 0.0469 Epoch: 6 Global Step: 78280 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:41,097-Speed 3272.55 samples/sec Loss 5.6027 LearningRate 0.0469 Epoch: 6 Global Step: 78290 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:28:44,178-Speed 3323.71 samples/sec Loss 5.6129 LearningRate 0.0469 Epoch: 6 Global Step: 78300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:28:47,270-Speed 3312.79 samples/sec Loss 5.6073 LearningRate 0.0469 Epoch: 6 Global Step: 78310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:28:50,339-Speed 3338.04 samples/sec Loss 5.6285 LearningRate 0.0469 Epoch: 6 Global Step: 78320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:28:53,542-Speed 3198.15 samples/sec Loss 5.5382 LearningRate 0.0469 Epoch: 6 Global Step: 78330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:28:56,640-Speed 3306.32 samples/sec Loss 5.6352 LearningRate 0.0469 Epoch: 6 Global Step: 78340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:28:59,696-Speed 3352.32 samples/sec Loss 5.6679 LearningRate 0.0469 Epoch: 6 Global Step: 78350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:02,841-Speed 3256.87 samples/sec Loss 5.5967 LearningRate 0.0469 Epoch: 6 Global Step: 78360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:05,940-Speed 3305.18 samples/sec Loss 5.4606 LearningRate 0.0469 Epoch: 6 Global Step: 78370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:09,034-Speed 3310.03 samples/sec Loss 5.5617 LearningRate 0.0469 Epoch: 6 Global Step: 78380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:12,102-Speed 3338.84 samples/sec Loss 5.6265 LearningRate 0.0468 Epoch: 6 Global Step: 78390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:15,256-Speed 3248.26 samples/sec Loss 5.5223 LearningRate 0.0468 Epoch: 6 Global Step: 78400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:18,404-Speed 3253.81 samples/sec Loss 5.5235 LearningRate 0.0468 Epoch: 6 Global Step: 78410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:21,437-Speed 3376.87 samples/sec Loss 5.5569 LearningRate 0.0468 Epoch: 6 Global Step: 78420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:24,559-Speed 3280.84 samples/sec Loss 5.5431 LearningRate 0.0468 Epoch: 6 Global Step: 78430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:27,608-Speed 3359.84 samples/sec Loss 5.6078 LearningRate 0.0468 Epoch: 6 Global Step: 78440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:30,687-Speed 3327.57 samples/sec Loss 5.6653 LearningRate 0.0468 Epoch: 6 Global Step: 78450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:29:33,758-Speed 3334.91 samples/sec Loss 5.6032 LearningRate 0.0468 Epoch: 6 Global Step: 78460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:29:36,823-Speed 3341.54 samples/sec Loss 5.6040 LearningRate 0.0468 Epoch: 6 Global Step: 78470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:39,888-Speed 3342.64 samples/sec Loss 5.5987 LearningRate 0.0468 Epoch: 6 Global Step: 78480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:42,976-Speed 3317.20 samples/sec Loss 5.7325 LearningRate 0.0468 Epoch: 6 Global Step: 78490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:46,036-Speed 3346.97 samples/sec Loss 5.6198 LearningRate 0.0468 Epoch: 6 Global Step: 78500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:49,144-Speed 3295.90 samples/sec Loss 5.6226 LearningRate 0.0468 Epoch: 6 Global Step: 78510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:52,208-Speed 3343.45 samples/sec Loss 5.5812 LearningRate 0.0468 Epoch: 6 Global Step: 78520 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:55,332-Speed 3279.54 samples/sec Loss 5.6279 LearningRate 0.0468 Epoch: 6 Global Step: 78530 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:29:58,414-Speed 3323.64 samples/sec Loss 5.5432 LearningRate 0.0468 Epoch: 6 Global Step: 78540 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:30:01,619-Speed 3195.18 samples/sec Loss 5.6241 LearningRate 0.0468 Epoch: 6 Global Step: 78550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:30:04,712-Speed 3311.43 samples/sec Loss 5.5813 LearningRate 0.0468 Epoch: 6 Global Step: 78560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:30:07,797-Speed 3320.83 samples/sec Loss 5.5960 LearningRate 0.0467 Epoch: 6 Global Step: 78570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 08:30:10,888-Speed 3314.51 samples/sec Loss 5.5736 LearningRate 0.0467 Epoch: 6 Global Step: 78580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:30:14,048-Speed 3241.38 samples/sec Loss 5.5905 LearningRate 0.0467 Epoch: 6 Global Step: 78590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:30:17,213-Speed 3235.52 samples/sec Loss 5.5592 LearningRate 0.0467 Epoch: 6 Global Step: 78600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:30:20,349-Speed 3266.93 samples/sec Loss 5.5943 LearningRate 0.0467 Epoch: 6 Global Step: 78610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:30:23,436-Speed 3317.74 samples/sec Loss 5.7186 LearningRate 0.0467 Epoch: 6 Global Step: 78620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:30:26,516-Speed 3326.57 samples/sec Loss 5.6286 LearningRate 0.0467 Epoch: 6 Global Step: 78630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:30:29,594-Speed 3327.64 samples/sec Loss 5.5563 LearningRate 0.0467 Epoch: 6 Global Step: 78640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:30:32,693-Speed 3304.82 samples/sec Loss 5.6154 LearningRate 0.0467 Epoch: 6 Global Step: 78650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:30:35,859-Speed 3235.86 samples/sec Loss 5.5606 LearningRate 0.0467 Epoch: 6 Global Step: 78660 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:30:38,968-Speed 3294.06 samples/sec Loss 5.5502 LearningRate 0.0467 Epoch: 6 Global Step: 78670 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:30:42,079-Speed 3292.54 samples/sec Loss 5.6490 LearningRate 0.0467 Epoch: 6 Global Step: 78680 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:30:45,137-Speed 3350.10 samples/sec Loss 5.5558 LearningRate 0.0467 Epoch: 6 Global Step: 78690 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:30:48,232-Speed 3310.20 samples/sec Loss 5.5197 LearningRate 0.0467 Epoch: 6 Global Step: 78700 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:30:51,306-Speed 3331.26 samples/sec Loss 5.6832 LearningRate 0.0467 Epoch: 6 Global Step: 78710 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:30:54,388-Speed 3324.18 samples/sec Loss 5.5815 LearningRate 0.0467 Epoch: 6 Global Step: 78720 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:30:57,470-Speed 3324.00 samples/sec Loss 5.6216 LearningRate 0.0467 Epoch: 6 Global Step: 78730 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:31:00,570-Speed 3304.10 samples/sec Loss 5.5728 LearningRate 0.0467 Epoch: 6 Global Step: 78740 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:31:03,721-Speed 3249.92 samples/sec Loss 5.6735 LearningRate 0.0466 Epoch: 6 Global Step: 78750 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:31:06,836-Speed 3288.91 samples/sec Loss 5.6544 LearningRate 0.0466 Epoch: 6 Global Step: 78760 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:31:09,902-Speed 3340.70 samples/sec Loss 5.6967 LearningRate 0.0466 Epoch: 6 Global Step: 78770 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:31:12,961-Speed 3349.09 samples/sec Loss 5.6840 LearningRate 0.0466 Epoch: 6 Global Step: 78780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:31:16,019-Speed 3349.50 samples/sec Loss 5.7265 LearningRate 0.0466 Epoch: 6 Global Step: 78790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:31:19,098-Speed 3326.43 samples/sec Loss 5.6670 LearningRate 0.0466 Epoch: 6 Global Step: 78800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:31:22,167-Speed 3338.34 samples/sec Loss 5.5912 LearningRate 0.0466 Epoch: 6 Global Step: 78810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:31:25,270-Speed 3301.33 samples/sec Loss 5.5593 LearningRate 0.0466 Epoch: 6 Global Step: 78820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:31:28,374-Speed 3299.80 samples/sec Loss 5.6596 LearningRate 0.0466 Epoch: 6 Global Step: 78830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:31:31,442-Speed 3338.96 samples/sec Loss 5.6502 LearningRate 0.0466 Epoch: 6 Global Step: 78840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 08:31:34,482-Speed 3369.80 samples/sec Loss 5.5644 LearningRate 0.0466 Epoch: 6 Global Step: 78850 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:31:37,556-Speed 3331.85 samples/sec Loss 5.6784 LearningRate 0.0466 Epoch: 6 Global Step: 78860 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:31:40,620-Speed 3344.04 samples/sec Loss 5.6171 LearningRate 0.0466 Epoch: 6 Global Step: 78870 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:31:43,677-Speed 3350.30 samples/sec Loss 5.6726 LearningRate 0.0466 Epoch: 6 Global Step: 78880 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:31:46,769-Speed 3312.75 samples/sec Loss 5.6747 LearningRate 0.0466 Epoch: 6 Global Step: 78890 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-27 08:31:49,841-Speed 3334.02 samples/sec Loss 5.6569 LearningRate 0.0466 Epoch: 6 Global Step: 78900 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:31:52,918-Speed 3329.98 samples/sec Loss 5.5416 LearningRate 0.0466 Epoch: 6 Global Step: 78910 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:31:55,978-Speed 3346.39 samples/sec Loss 5.6043 LearningRate 0.0466 Epoch: 6 Global Step: 78920 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:31:59,075-Speed 3308.34 samples/sec Loss 5.6578 LearningRate 0.0465 Epoch: 6 Global Step: 78930 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:32:02,207-Speed 3270.18 samples/sec Loss 5.6879 LearningRate 0.0465 Epoch: 6 Global Step: 78940 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:32:05,303-Speed 3308.69 samples/sec Loss 5.6071 LearningRate 0.0465 Epoch: 6 Global Step: 78950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:32:08,363-Speed 3347.80 samples/sec Loss 5.7144 LearningRate 0.0465 Epoch: 6 Global Step: 78960 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:32:11,450-Speed 3317.51 samples/sec Loss 5.6537 LearningRate 0.0465 Epoch: 6 Global Step: 78970 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:32:14,550-Speed 3304.20 samples/sec Loss 5.6292 LearningRate 0.0465 Epoch: 6 Global Step: 78980 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:32:17,616-Speed 3341.34 samples/sec Loss 5.7396 LearningRate 0.0465 Epoch: 6 Global Step: 78990 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:32:20,714-Speed 3306.66 samples/sec Loss 5.5658 LearningRate 0.0465 Epoch: 6 Global Step: 79000 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:32:23,814-Speed 3303.83 samples/sec Loss 5.5639 LearningRate 0.0465 Epoch: 6 Global Step: 79010 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:32:26,900-Speed 3319.80 samples/sec Loss 5.6291 LearningRate 0.0465 Epoch: 6 Global Step: 79020 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:32:29,995-Speed 3308.96 samples/sec Loss 5.7939 LearningRate 0.0465 Epoch: 6 Global Step: 79030 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:32:33,087-Speed 3313.70 samples/sec Loss 5.5624 LearningRate 0.0465 Epoch: 6 Global Step: 79040 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:32:36,135-Speed 3360.54 samples/sec Loss 5.7365 LearningRate 0.0465 Epoch: 6 Global Step: 79050 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:32:39,199-Speed 3342.98 samples/sec Loss 5.6821 LearningRate 0.0465 Epoch: 6 Global Step: 79060 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:32:42,287-Speed 3317.81 samples/sec Loss 5.5100 LearningRate 0.0465 Epoch: 6 Global Step: 79070 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:32:45,333-Speed 3362.54 samples/sec Loss 5.6233 LearningRate 0.0465 Epoch: 6 Global Step: 79080 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:32:48,429-Speed 3308.88 samples/sec Loss 5.6612 LearningRate 0.0465 Epoch: 6 Global Step: 79090 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:32:51,590-Speed 3240.72 samples/sec Loss 5.7307 LearningRate 0.0465 Epoch: 6 Global Step: 79100 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:32:55,471-Speed 2638.85 samples/sec Loss 5.7701 LearningRate 0.0465 Epoch: 6 Global Step: 79110 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:32:58,525-Speed 3353.79 samples/sec Loss 5.6808 LearningRate 0.0464 Epoch: 6 Global Step: 79120 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:33:01,602-Speed 3329.40 samples/sec Loss 5.7852 LearningRate 0.0464 Epoch: 6 Global Step: 79130 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:33:04,708-Speed 3297.81 samples/sec Loss 5.6803 LearningRate 0.0464 Epoch: 6 Global Step: 79140 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:33:07,786-Speed 3328.22 samples/sec Loss 5.6747 LearningRate 0.0464 Epoch: 6 Global Step: 79150 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:33:10,878-Speed 3312.38 samples/sec Loss 5.7684 LearningRate 0.0464 Epoch: 6 Global Step: 79160 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:33:13,949-Speed 3335.77 samples/sec Loss 5.6301 LearningRate 0.0464 Epoch: 6 Global Step: 79170 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:33:17,042-Speed 3312.12 samples/sec Loss 5.7325 LearningRate 0.0464 Epoch: 6 Global Step: 79180 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:33:20,175-Speed 3269.75 samples/sec Loss 5.5323 LearningRate 0.0464 Epoch: 6 Global Step: 79190 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:33:23,266-Speed 3313.37 samples/sec Loss 5.6842 LearningRate 0.0464 Epoch: 6 Global Step: 79200 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:33:26,365-Speed 3305.72 samples/sec Loss 5.6414 LearningRate 0.0464 Epoch: 6 Global Step: 79210 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:33:29,413-Speed 3360.72 samples/sec Loss 5.6238 LearningRate 0.0464 Epoch: 6 Global Step: 79220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:33:32,529-Speed 3287.32 samples/sec Loss 5.5980 LearningRate 0.0464 Epoch: 6 Global Step: 79230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:33:35,608-Speed 3326.79 samples/sec Loss 5.7063 LearningRate 0.0464 Epoch: 6 Global Step: 79240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:33:38,847-Speed 3162.96 samples/sec Loss 5.6660 LearningRate 0.0464 Epoch: 6 Global Step: 79250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:33:42,027-Speed 3220.41 samples/sec Loss 5.6566 LearningRate 0.0464 Epoch: 6 Global Step: 79260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:33:45,107-Speed 3326.33 samples/sec Loss 5.7086 LearningRate 0.0464 Epoch: 6 Global Step: 79270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:33:48,222-Speed 3287.84 samples/sec Loss 5.7611 LearningRate 0.0464 Epoch: 6 Global Step: 79280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:33:51,281-Speed 3348.99 samples/sec Loss 5.6292 LearningRate 0.0464 Epoch: 6 Global Step: 79290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:33:54,394-Speed 3290.42 samples/sec Loss 5.6948 LearningRate 0.0463 Epoch: 6 Global Step: 79300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:33:57,464-Speed 3336.84 samples/sec Loss 5.6216 LearningRate 0.0463 Epoch: 6 Global Step: 79310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:00,551-Speed 3318.25 samples/sec Loss 5.5460 LearningRate 0.0463 Epoch: 6 Global Step: 79320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:34:03,646-Speed 3309.49 samples/sec Loss 5.7050 LearningRate 0.0463 Epoch: 6 Global Step: 79330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:34:06,751-Speed 3299.16 samples/sec Loss 5.6754 LearningRate 0.0463 Epoch: 6 Global Step: 79340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:34:09,799-Speed 3360.38 samples/sec Loss 5.7000 LearningRate 0.0463 Epoch: 6 Global Step: 79350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:34:12,854-Speed 3353.28 samples/sec Loss 5.7245 LearningRate 0.0463 Epoch: 6 Global Step: 79360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:34:15,934-Speed 3325.36 samples/sec Loss 5.7624 LearningRate 0.0463 Epoch: 6 Global Step: 79370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:19,068-Speed 3268.38 samples/sec Loss 5.6062 LearningRate 0.0463 Epoch: 6 Global Step: 79380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:22,132-Speed 3343.58 samples/sec Loss 5.6917 LearningRate 0.0463 Epoch: 6 Global Step: 79390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:25,224-Speed 3312.86 samples/sec Loss 5.7602 LearningRate 0.0463 Epoch: 6 Global Step: 79400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:28,334-Speed 3292.90 samples/sec Loss 5.6536 LearningRate 0.0463 Epoch: 6 Global Step: 79410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:32,691-Speed 2350.68 samples/sec Loss 5.6729 LearningRate 0.0463 Epoch: 6 Global Step: 79420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:36,920-Speed 2422.29 samples/sec Loss 5.6854 LearningRate 0.0463 Epoch: 6 Global Step: 79430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:40,057-Speed 3265.00 samples/sec Loss 5.6358 LearningRate 0.0463 Epoch: 6 Global Step: 79440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:43,145-Speed 3317.63 samples/sec Loss 5.7253 LearningRate 0.0463 Epoch: 6 Global Step: 79450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:46,224-Speed 3326.68 samples/sec Loss 5.7650 LearningRate 0.0463 Epoch: 6 Global Step: 79460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:49,287-Speed 3344.29 samples/sec Loss 5.7606 LearningRate 0.0463 Epoch: 6 Global Step: 79470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:52,423-Speed 3266.33 samples/sec Loss 5.6244 LearningRate 0.0462 Epoch: 6 Global Step: 79480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:55,494-Speed 3335.81 samples/sec Loss 5.6231 LearningRate 0.0462 Epoch: 6 Global Step: 79490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:34:58,567-Speed 3333.33 samples/sec Loss 5.6800 LearningRate 0.0462 Epoch: 6 Global Step: 79500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:01,720-Speed 3248.15 samples/sec Loss 5.7302 LearningRate 0.0462 Epoch: 6 Global Step: 79510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:04,801-Speed 3325.13 samples/sec Loss 5.5766 LearningRate 0.0462 Epoch: 6 Global Step: 79520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:07,872-Speed 3335.38 samples/sec Loss 5.7133 LearningRate 0.0462 Epoch: 6 Global Step: 79530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:10,956-Speed 3321.51 samples/sec Loss 5.6954 LearningRate 0.0462 Epoch: 6 Global Step: 79540 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:14,029-Speed 3333.30 samples/sec Loss 5.7743 LearningRate 0.0462 Epoch: 6 Global Step: 79550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:17,107-Speed 3327.43 samples/sec Loss 5.6884 LearningRate 0.0462 Epoch: 6 Global Step: 79560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:20,185-Speed 3328.27 samples/sec Loss 5.6395 LearningRate 0.0462 Epoch: 6 Global Step: 79570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:35:23,307-Speed 3280.66 samples/sec Loss 5.8140 LearningRate 0.0462 Epoch: 6 Global Step: 79580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:35:26,356-Speed 3359.67 samples/sec Loss 5.6804 LearningRate 0.0462 Epoch: 6 Global Step: 79590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:29,438-Speed 3323.29 samples/sec Loss 5.7495 LearningRate 0.0462 Epoch: 6 Global Step: 79600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:32,508-Speed 3336.54 samples/sec Loss 5.6500 LearningRate 0.0462 Epoch: 6 Global Step: 79610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:35,636-Speed 3274.92 samples/sec Loss 5.7470 LearningRate 0.0462 Epoch: 6 Global Step: 79620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:38,732-Speed 3308.97 samples/sec Loss 5.6890 LearningRate 0.0462 Epoch: 6 Global Step: 79630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:41,820-Speed 3317.04 samples/sec Loss 5.6222 LearningRate 0.0462 Epoch: 6 Global Step: 79640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:44,896-Speed 3329.95 samples/sec Loss 5.7324 LearningRate 0.0462 Epoch: 6 Global Step: 79650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:47,942-Speed 3363.26 samples/sec Loss 5.6142 LearningRate 0.0461 Epoch: 6 Global Step: 79660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:51,009-Speed 3339.02 samples/sec Loss 5.7467 LearningRate 0.0461 Epoch: 6 Global Step: 79670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:54,112-Speed 3300.66 samples/sec Loss 5.7576 LearningRate 0.0461 Epoch: 6 Global Step: 79680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:35:57,189-Speed 3329.67 samples/sec Loss 5.6665 LearningRate 0.0461 Epoch: 6 Global Step: 79690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:36:00,282-Speed 3311.18 samples/sec Loss 5.6770 LearningRate 0.0461 Epoch: 6 Global Step: 79700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:36:03,362-Speed 3326.30 samples/sec Loss 5.7122 LearningRate 0.0461 Epoch: 6 Global Step: 79710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:06,423-Speed 3346.57 samples/sec Loss 5.6550 LearningRate 0.0461 Epoch: 6 Global Step: 79720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:09,497-Speed 3331.74 samples/sec Loss 5.7949 LearningRate 0.0461 Epoch: 6 Global Step: 79730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:12,621-Speed 3279.66 samples/sec Loss 5.6488 LearningRate 0.0461 Epoch: 6 Global Step: 79740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:15,751-Speed 3272.01 samples/sec Loss 5.6485 LearningRate 0.0461 Epoch: 6 Global Step: 79750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:18,807-Speed 3352.57 samples/sec Loss 5.7193 LearningRate 0.0461 Epoch: 6 Global Step: 79760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:21,878-Speed 3335.65 samples/sec Loss 5.7494 LearningRate 0.0461 Epoch: 6 Global Step: 79770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:24,993-Speed 3287.45 samples/sec Loss 5.8339 LearningRate 0.0461 Epoch: 6 Global Step: 79780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:28,108-Speed 3288.68 samples/sec Loss 5.8068 LearningRate 0.0461 Epoch: 6 Global Step: 79790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:31,236-Speed 3274.84 samples/sec Loss 5.6476 LearningRate 0.0461 Epoch: 6 Global Step: 79800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:34,299-Speed 3344.23 samples/sec Loss 5.5885 LearningRate 0.0461 Epoch: 6 Global Step: 79810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:36:37,372-Speed 3332.89 samples/sec Loss 5.6425 LearningRate 0.0461 Epoch: 6 Global Step: 79820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:36:40,504-Speed 3270.15 samples/sec Loss 5.6616 LearningRate 0.0461 Epoch: 6 Global Step: 79830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:36:43,606-Speed 3302.99 samples/sec Loss 5.6261 LearningRate 0.0461 Epoch: 6 Global Step: 79840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:36:46,693-Speed 3318.16 samples/sec Loss 5.6235 LearningRate 0.0460 Epoch: 6 Global Step: 79850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:36:49,756-Speed 3344.22 samples/sec Loss 5.6119 LearningRate 0.0460 Epoch: 6 Global Step: 79860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:52,900-Speed 3257.21 samples/sec Loss 5.7501 LearningRate 0.0460 Epoch: 6 Global Step: 79870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:55,983-Speed 3322.59 samples/sec Loss 5.7794 LearningRate 0.0460 Epoch: 6 Global Step: 79880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:36:59,038-Speed 3353.44 samples/sec Loss 5.6675 LearningRate 0.0460 Epoch: 6 Global Step: 79890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:37:02,153-Speed 3288.93 samples/sec Loss 5.7089 LearningRate 0.0460 Epoch: 6 Global Step: 79900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:37:05,218-Speed 3342.37 samples/sec Loss 5.6645 LearningRate 0.0460 Epoch: 6 Global Step: 79910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:37:08,302-Speed 3320.35 samples/sec Loss 5.7310 LearningRate 0.0460 Epoch: 6 Global Step: 79920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:37:11,377-Speed 3330.86 samples/sec Loss 5.7109 LearningRate 0.0460 Epoch: 6 Global Step: 79930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:37:14,452-Speed 3332.28 samples/sec Loss 5.8439 LearningRate 0.0460 Epoch: 6 Global Step: 79940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:37:17,514-Speed 3345.10 samples/sec Loss 5.7832 LearningRate 0.0460 Epoch: 6 Global Step: 79950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:37:20,560-Speed 3362.46 samples/sec Loss 5.6520 LearningRate 0.0460 Epoch: 6 Global Step: 79960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:37:23,728-Speed 3233.20 samples/sec Loss 5.7867 LearningRate 0.0460 Epoch: 6 Global Step: 79970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:37:26,803-Speed 3331.21 samples/sec Loss 5.6619 LearningRate 0.0460 Epoch: 6 Global Step: 79980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:37:29,915-Speed 3291.53 samples/sec Loss 5.6834 LearningRate 0.0460 Epoch: 6 Global Step: 79990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:37:33,008-Speed 3312.09 samples/sec Loss 5.7391 LearningRate 0.0460 Epoch: 6 Global Step: 80000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:37:36,150-Speed 3259.89 samples/sec Loss 5.6290 LearningRate 0.0460 Epoch: 6 Global Step: 80010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:37:39,241-Speed 3313.64 samples/sec Loss 5.6763 LearningRate 0.0460 Epoch: 6 Global Step: 80020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:37:42,344-Speed 3300.80 samples/sec Loss 5.6940 LearningRate 0.0459 Epoch: 6 Global Step: 80030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:37:45,412-Speed 3339.70 samples/sec Loss 5.6797 LearningRate 0.0459 Epoch: 6 Global Step: 80040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:37:48,471-Speed 3348.52 samples/sec Loss 5.7992 LearningRate 0.0459 Epoch: 6 Global Step: 80050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:37:51,547-Speed 3329.90 samples/sec Loss 5.7690 LearningRate 0.0459 Epoch: 6 Global Step: 80060 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:37:54,629-Speed 3323.27 samples/sec Loss 5.7484 LearningRate 0.0459 Epoch: 6 Global Step: 80070 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:37:57,745-Speed 3287.25 samples/sec Loss 5.7127 LearningRate 0.0459 Epoch: 6 Global Step: 80080 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:38:00,798-Speed 3355.64 samples/sec Loss 5.7983 LearningRate 0.0459 Epoch: 6 Global Step: 80090 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:38:03,894-Speed 3308.13 samples/sec Loss 5.6889 LearningRate 0.0459 Epoch: 6 Global Step: 80100 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:38:07,063-Speed 3232.57 samples/sec Loss 5.7669 LearningRate 0.0459 Epoch: 6 Global Step: 80110 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:38:10,130-Speed 3339.85 samples/sec Loss 5.7563 LearningRate 0.0459 Epoch: 6 Global Step: 80120 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:38:13,253-Speed 3280.09 samples/sec Loss 5.6938 LearningRate 0.0459 Epoch: 6 Global Step: 80130 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:38:16,375-Speed 3280.68 samples/sec Loss 5.7202 LearningRate 0.0459 Epoch: 6 Global Step: 80140 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:38:19,455-Speed 3325.86 samples/sec Loss 5.7397 LearningRate 0.0459 Epoch: 6 Global Step: 80150 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:38:22,524-Speed 3337.18 samples/sec Loss 5.7526 LearningRate 0.0459 Epoch: 6 Global Step: 80160 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:38:25,672-Speed 3254.85 samples/sec Loss 5.7292 LearningRate 0.0459 Epoch: 6 Global Step: 80170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:38:28,796-Speed 3278.81 samples/sec Loss 5.7417 LearningRate 0.0459 Epoch: 6 Global Step: 80180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:38:31,908-Speed 3290.94 samples/sec Loss 5.8612 LearningRate 0.0459 Epoch: 6 Global Step: 80190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:38:34,975-Speed 3340.21 samples/sec Loss 5.6825 LearningRate 0.0459 Epoch: 6 Global Step: 80200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:38:38,068-Speed 3311.29 samples/sec Loss 5.7293 LearningRate 0.0458 Epoch: 6 Global Step: 80210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:38:41,236-Speed 3233.55 samples/sec Loss 5.7629 LearningRate 0.0458 Epoch: 6 Global Step: 80220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:38:44,317-Speed 3325.09 samples/sec Loss 5.7933 LearningRate 0.0458 Epoch: 6 Global Step: 80230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:38:47,379-Speed 3345.34 samples/sec Loss 5.7369 LearningRate 0.0458 Epoch: 6 Global Step: 80240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:38:50,576-Speed 3203.51 samples/sec Loss 5.7708 LearningRate 0.0458 Epoch: 6 Global Step: 80250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:38:53,675-Speed 3306.05 samples/sec Loss 5.7803 LearningRate 0.0458 Epoch: 6 Global Step: 80260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:38:56,750-Speed 3330.60 samples/sec Loss 5.7134 LearningRate 0.0458 Epoch: 6 Global Step: 80270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:38:59,785-Speed 3375.51 samples/sec Loss 5.7261 LearningRate 0.0458 Epoch: 6 Global Step: 80280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:39:02,898-Speed 3289.95 samples/sec Loss 5.8196 LearningRate 0.0458 Epoch: 6 Global Step: 80290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:39:05,968-Speed 3337.54 samples/sec Loss 5.7838 LearningRate 0.0458 Epoch: 6 Global Step: 80300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:39:09,039-Speed 3334.77 samples/sec Loss 5.6616 LearningRate 0.0458 Epoch: 6 Global Step: 80310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:39:12,197-Speed 3243.88 samples/sec Loss 5.8469 LearningRate 0.0458 Epoch: 6 Global Step: 80320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:39:15,296-Speed 3305.72 samples/sec Loss 5.8013 LearningRate 0.0458 Epoch: 6 Global Step: 80330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:39:18,358-Speed 3344.53 samples/sec Loss 5.7678 LearningRate 0.0458 Epoch: 6 Global Step: 80340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:39:21,429-Speed 3336.19 samples/sec Loss 5.7228 LearningRate 0.0458 Epoch: 6 Global Step: 80350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:39:24,549-Speed 3282.59 samples/sec Loss 5.8595 LearningRate 0.0458 Epoch: 6 Global Step: 80360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:39:27,621-Speed 3334.28 samples/sec Loss 5.7530 LearningRate 0.0458 Epoch: 6 Global Step: 80370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:39:30,757-Speed 3266.10 samples/sec Loss 5.7368 LearningRate 0.0458 Epoch: 6 Global Step: 80380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:39:33,863-Speed 3298.07 samples/sec Loss 5.7343 LearningRate 0.0458 Epoch: 6 Global Step: 80390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:39:37,011-Speed 3254.68 samples/sec Loss 5.9009 LearningRate 0.0457 Epoch: 6 Global Step: 80400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:39:40,168-Speed 3244.52 samples/sec Loss 5.6890 LearningRate 0.0457 Epoch: 6 Global Step: 80410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:39:43,326-Speed 3243.48 samples/sec Loss 5.7836 LearningRate 0.0457 Epoch: 6 Global Step: 80420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:39:46,387-Speed 3346.91 samples/sec Loss 5.7128 LearningRate 0.0457 Epoch: 6 Global Step: 80430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:39:49,448-Speed 3346.12 samples/sec Loss 5.6966 LearningRate 0.0457 Epoch: 6 Global Step: 80440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:39:52,506-Speed 3349.11 samples/sec Loss 5.7573 LearningRate 0.0457 Epoch: 6 Global Step: 80450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:39:55,643-Speed 3265.63 samples/sec Loss 5.7164 LearningRate 0.0457 Epoch: 6 Global Step: 80460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:39:58,696-Speed 3354.94 samples/sec Loss 5.7664 LearningRate 0.0457 Epoch: 6 Global Step: 80470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:40:01,838-Speed 3260.50 samples/sec Loss 5.6760 LearningRate 0.0457 Epoch: 6 Global Step: 80480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 08:40:04,937-Speed 3305.27 samples/sec Loss 5.6964 LearningRate 0.0457 Epoch: 6 Global Step: 80490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:40:08,019-Speed 3323.45 samples/sec Loss 5.6695 LearningRate 0.0457 Epoch: 6 Global Step: 80500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:40:11,090-Speed 3335.25 samples/sec Loss 5.8003 LearningRate 0.0457 Epoch: 6 Global Step: 80510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:40:14,238-Speed 3254.78 samples/sec Loss 5.8241 LearningRate 0.0457 Epoch: 6 Global Step: 80520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:40:17,359-Speed 3281.60 samples/sec Loss 5.7967 LearningRate 0.0457 Epoch: 6 Global Step: 80530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:40:20,428-Speed 3337.63 samples/sec Loss 5.8124 LearningRate 0.0457 Epoch: 6 Global Step: 80540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:40:23,461-Speed 3377.33 samples/sec Loss 5.6802 LearningRate 0.0457 Epoch: 6 Global Step: 80550 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:40:26,561-Speed 3304.89 samples/sec Loss 5.8234 LearningRate 0.0457 Epoch: 6 Global Step: 80560 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:40:29,644-Speed 3322.56 samples/sec Loss 5.7837 LearningRate 0.0457 Epoch: 6 Global Step: 80570 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:40:32,736-Speed 3312.23 samples/sec Loss 5.7451 LearningRate 0.0456 Epoch: 6 Global Step: 80580 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:40:35,795-Speed 3349.16 samples/sec Loss 5.9305 LearningRate 0.0456 Epoch: 6 Global Step: 80590 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:40:38,870-Speed 3330.31 samples/sec Loss 5.7512 LearningRate 0.0456 Epoch: 6 Global Step: 80600 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:40:41,939-Speed 3337.86 samples/sec Loss 5.6604 LearningRate 0.0456 Epoch: 6 Global Step: 80610 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:40:45,002-Speed 3345.13 samples/sec Loss 5.8019 LearningRate 0.0456 Epoch: 6 Global Step: 80620 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:40:48,081-Speed 3325.92 samples/sec Loss 5.7531 LearningRate 0.0456 Epoch: 6 Global Step: 80630 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:40:51,221-Speed 3262.36 samples/sec Loss 5.7540 LearningRate 0.0456 Epoch: 6 Global Step: 80640 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:40:54,347-Speed 3277.45 samples/sec Loss 5.7478 LearningRate 0.0456 Epoch: 6 Global Step: 80650 Fp16 Grad Scale: 8192 Required: 14 hours Training: 2022-04-27 08:40:57,409-Speed 3344.55 samples/sec Loss 5.7408 LearningRate 0.0456 Epoch: 6 Global Step: 80660 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:41:00,466-Speed 3350.20 samples/sec Loss 5.6583 LearningRate 0.0456 Epoch: 6 Global Step: 80670 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:41:03,587-Speed 3282.58 samples/sec Loss 5.6862 LearningRate 0.0456 Epoch: 6 Global Step: 80680 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:41:06,667-Speed 3325.58 samples/sec Loss 5.6594 LearningRate 0.0456 Epoch: 6 Global Step: 80690 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:41:09,768-Speed 3303.28 samples/sec Loss 5.6504 LearningRate 0.0456 Epoch: 6 Global Step: 80700 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:41:12,871-Speed 3300.71 samples/sec Loss 5.7564 LearningRate 0.0456 Epoch: 6 Global Step: 80710 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:41:16,038-Speed 3235.06 samples/sec Loss 5.8389 LearningRate 0.0456 Epoch: 6 Global Step: 80720 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:41:19,165-Speed 3275.11 samples/sec Loss 5.7710 LearningRate 0.0456 Epoch: 6 Global Step: 80730 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:41:22,237-Speed 3334.62 samples/sec Loss 5.7043 LearningRate 0.0456 Epoch: 6 Global Step: 80740 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:41:25,433-Speed 3205.03 samples/sec Loss 5.9026 LearningRate 0.0456 Epoch: 6 Global Step: 80750 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:41:28,515-Speed 3323.82 samples/sec Loss 5.7974 LearningRate 0.0455 Epoch: 6 Global Step: 80760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:41:31,644-Speed 3273.67 samples/sec Loss 5.7782 LearningRate 0.0455 Epoch: 6 Global Step: 80770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:41:34,722-Speed 3327.56 samples/sec Loss 5.7794 LearningRate 0.0455 Epoch: 6 Global Step: 80780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:41:37,801-Speed 3327.86 samples/sec Loss 5.7887 LearningRate 0.0455 Epoch: 6 Global Step: 80790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:41:40,882-Speed 3325.05 samples/sec Loss 5.7628 LearningRate 0.0455 Epoch: 6 Global Step: 80800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:41:43,953-Speed 3334.26 samples/sec Loss 5.7575 LearningRate 0.0455 Epoch: 6 Global Step: 80810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:41:47,098-Speed 3257.55 samples/sec Loss 5.7628 LearningRate 0.0455 Epoch: 6 Global Step: 80820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:41:50,218-Speed 3283.06 samples/sec Loss 5.8736 LearningRate 0.0455 Epoch: 6 Global Step: 80830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:41:53,400-Speed 3218.92 samples/sec Loss 5.7356 LearningRate 0.0455 Epoch: 6 Global Step: 80840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:41:56,483-Speed 3322.92 samples/sec Loss 5.7147 LearningRate 0.0455 Epoch: 6 Global Step: 80850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:41:59,561-Speed 3327.95 samples/sec Loss 5.7495 LearningRate 0.0455 Epoch: 6 Global Step: 80860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:42:02,638-Speed 3328.54 samples/sec Loss 5.6467 LearningRate 0.0455 Epoch: 6 Global Step: 80870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:42:05,698-Speed 3347.72 samples/sec Loss 5.5813 LearningRate 0.0455 Epoch: 6 Global Step: 80880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:42:08,748-Speed 3358.61 samples/sec Loss 5.8311 LearningRate 0.0455 Epoch: 6 Global Step: 80890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:42:11,879-Speed 3271.57 samples/sec Loss 5.7636 LearningRate 0.0455 Epoch: 6 Global Step: 80900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:42:15,025-Speed 3255.98 samples/sec Loss 5.8243 LearningRate 0.0455 Epoch: 6 Global Step: 80910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:42:18,176-Speed 3250.72 samples/sec Loss 5.8053 LearningRate 0.0455 Epoch: 6 Global Step: 80920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:42:21,258-Speed 3322.98 samples/sec Loss 5.7784 LearningRate 0.0455 Epoch: 6 Global Step: 80930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:42:24,349-Speed 3314.85 samples/sec Loss 5.7745 LearningRate 0.0455 Epoch: 6 Global Step: 80940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:42:27,408-Speed 3348.52 samples/sec Loss 5.7906 LearningRate 0.0454 Epoch: 6 Global Step: 80950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:42:30,544-Speed 3266.05 samples/sec Loss 5.7290 LearningRate 0.0454 Epoch: 6 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:42:33,661-Speed 3285.54 samples/sec Loss 5.8477 LearningRate 0.0454 Epoch: 6 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:42:36,763-Speed 3302.92 samples/sec Loss 5.8760 LearningRate 0.0454 Epoch: 6 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:42:39,929-Speed 3235.27 samples/sec Loss 5.7688 LearningRate 0.0454 Epoch: 6 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:42:43,123-Speed 3206.22 samples/sec Loss 5.7160 LearningRate 0.0454 Epoch: 6 Global Step: 81000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:42:46,216-Speed 3312.46 samples/sec Loss 5.7101 LearningRate 0.0454 Epoch: 6 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:42:49,385-Speed 3231.74 samples/sec Loss 5.8248 LearningRate 0.0454 Epoch: 6 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:42:52,574-Speed 3212.36 samples/sec Loss 5.7609 LearningRate 0.0454 Epoch: 6 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:42:55,662-Speed 3317.66 samples/sec Loss 5.7772 LearningRate 0.0454 Epoch: 6 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:42:58,745-Speed 3321.88 samples/sec Loss 5.9235 LearningRate 0.0454 Epoch: 6 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:43:01,872-Speed 3275.94 samples/sec Loss 5.7161 LearningRate 0.0454 Epoch: 6 Global Step: 81060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 08:43:04,980-Speed 3296.03 samples/sec Loss 5.8775 LearningRate 0.0454 Epoch: 6 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:43:08,075-Speed 3309.80 samples/sec Loss 5.7085 LearningRate 0.0454 Epoch: 6 Global Step: 81080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:43:11,143-Speed 3338.67 samples/sec Loss 5.8521 LearningRate 0.0454 Epoch: 6 Global Step: 81090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:43:14,250-Speed 3296.29 samples/sec Loss 5.7816 LearningRate 0.0454 Epoch: 6 Global Step: 81100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:43:17,389-Speed 3263.71 samples/sec Loss 5.8076 LearningRate 0.0454 Epoch: 6 Global Step: 81110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:43:20,445-Speed 3351.75 samples/sec Loss 5.8608 LearningRate 0.0454 Epoch: 6 Global Step: 81120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:43:23,538-Speed 3311.47 samples/sec Loss 5.7840 LearningRate 0.0453 Epoch: 6 Global Step: 81130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:43:26,607-Speed 3337.40 samples/sec Loss 5.6938 LearningRate 0.0453 Epoch: 6 Global Step: 81140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:43:29,822-Speed 3185.74 samples/sec Loss 5.6285 LearningRate 0.0453 Epoch: 6 Global Step: 81150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:43:32,925-Speed 3301.43 samples/sec Loss 5.7772 LearningRate 0.0453 Epoch: 6 Global Step: 81160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:43:36,070-Speed 3257.12 samples/sec Loss 5.7540 LearningRate 0.0453 Epoch: 6 Global Step: 81170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:43:39,203-Speed 3269.60 samples/sec Loss 5.7439 LearningRate 0.0453 Epoch: 6 Global Step: 81180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:43:42,331-Speed 3274.56 samples/sec Loss 5.7774 LearningRate 0.0453 Epoch: 6 Global Step: 81190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:43:45,414-Speed 3322.15 samples/sec Loss 5.8375 LearningRate 0.0453 Epoch: 6 Global Step: 81200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:43:48,495-Speed 3324.59 samples/sec Loss 5.8886 LearningRate 0.0453 Epoch: 6 Global Step: 81210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:43:51,602-Speed 3297.24 samples/sec Loss 5.7786 LearningRate 0.0453 Epoch: 6 Global Step: 81220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:43:54,726-Speed 3279.26 samples/sec Loss 5.8951 LearningRate 0.0453 Epoch: 6 Global Step: 81230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:43:57,831-Speed 3298.04 samples/sec Loss 5.7889 LearningRate 0.0453 Epoch: 6 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:44:00,912-Speed 3324.84 samples/sec Loss 5.7453 LearningRate 0.0453 Epoch: 6 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:44:03,957-Speed 3364.67 samples/sec Loss 5.7977 LearningRate 0.0453 Epoch: 6 Global Step: 81260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:44:07,095-Speed 3263.88 samples/sec Loss 5.8486 LearningRate 0.0453 Epoch: 6 Global Step: 81270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:44:10,160-Speed 3342.44 samples/sec Loss 5.7974 LearningRate 0.0453 Epoch: 6 Global Step: 81280 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:44:13,297-Speed 3265.28 samples/sec Loss 5.6878 LearningRate 0.0453 Epoch: 6 Global Step: 81290 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:44:16,491-Speed 3207.13 samples/sec Loss 5.7781 LearningRate 0.0453 Epoch: 6 Global Step: 81300 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:44:19,572-Speed 3324.67 samples/sec Loss 5.6598 LearningRate 0.0453 Epoch: 6 Global Step: 81310 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:44:22,627-Speed 3352.71 samples/sec Loss 5.8162 LearningRate 0.0452 Epoch: 6 Global Step: 81320 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:44:25,716-Speed 3316.59 samples/sec Loss 5.7815 LearningRate 0.0452 Epoch: 6 Global Step: 81330 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:44:28,824-Speed 3295.53 samples/sec Loss 5.8364 LearningRate 0.0452 Epoch: 6 Global Step: 81340 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:44:31,885-Speed 3346.33 samples/sec Loss 5.8118 LearningRate 0.0452 Epoch: 6 Global Step: 81350 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:44:34,964-Speed 3326.40 samples/sec Loss 5.7722 LearningRate 0.0452 Epoch: 6 Global Step: 81360 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:44:38,083-Speed 3284.41 samples/sec Loss 5.7888 LearningRate 0.0452 Epoch: 6 Global Step: 81370 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:44:41,229-Speed 3255.45 samples/sec Loss 5.7876 LearningRate 0.0452 Epoch: 6 Global Step: 81380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:44:44,331-Speed 3302.79 samples/sec Loss 5.8830 LearningRate 0.0452 Epoch: 6 Global Step: 81390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:44:47,459-Speed 3274.29 samples/sec Loss 5.8546 LearningRate 0.0452 Epoch: 6 Global Step: 81400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:44:50,568-Speed 3295.18 samples/sec Loss 5.7080 LearningRate 0.0452 Epoch: 6 Global Step: 81410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:44:53,624-Speed 3351.53 samples/sec Loss 5.8878 LearningRate 0.0452 Epoch: 6 Global Step: 81420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:44:56,714-Speed 3315.31 samples/sec Loss 5.8247 LearningRate 0.0452 Epoch: 6 Global Step: 81430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:44:59,852-Speed 3264.08 samples/sec Loss 5.8606 LearningRate 0.0452 Epoch: 6 Global Step: 81440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:02,933-Speed 3325.13 samples/sec Loss 5.7354 LearningRate 0.0452 Epoch: 6 Global Step: 81450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:06,049-Speed 3286.83 samples/sec Loss 5.7807 LearningRate 0.0452 Epoch: 6 Global Step: 81460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:09,118-Speed 3337.68 samples/sec Loss 5.7470 LearningRate 0.0452 Epoch: 6 Global Step: 81470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:12,173-Speed 3352.58 samples/sec Loss 5.8233 LearningRate 0.0452 Epoch: 6 Global Step: 81480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:45:15,262-Speed 3316.88 samples/sec Loss 5.7801 LearningRate 0.0452 Epoch: 6 Global Step: 81490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:18,314-Speed 3355.71 samples/sec Loss 5.7194 LearningRate 0.0451 Epoch: 6 Global Step: 81500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:21,387-Speed 3333.80 samples/sec Loss 5.7496 LearningRate 0.0451 Epoch: 6 Global Step: 81510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:24,480-Speed 3311.40 samples/sec Loss 5.7973 LearningRate 0.0451 Epoch: 6 Global Step: 81520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:27,580-Speed 3304.74 samples/sec Loss 5.6993 LearningRate 0.0451 Epoch: 6 Global Step: 81530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:30,707-Speed 3275.72 samples/sec Loss 5.7618 LearningRate 0.0451 Epoch: 6 Global Step: 81540 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:33,838-Speed 3271.16 samples/sec Loss 5.7960 LearningRate 0.0451 Epoch: 6 Global Step: 81550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:36,946-Speed 3296.21 samples/sec Loss 5.8473 LearningRate 0.0451 Epoch: 6 Global Step: 81560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:40,113-Speed 3233.49 samples/sec Loss 5.7771 LearningRate 0.0451 Epoch: 6 Global Step: 81570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:43,193-Speed 3326.47 samples/sec Loss 5.8259 LearningRate 0.0451 Epoch: 6 Global Step: 81580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:46,323-Speed 3271.83 samples/sec Loss 5.8194 LearningRate 0.0451 Epoch: 6 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:45:49,524-Speed 3199.71 samples/sec Loss 5.8521 LearningRate 0.0451 Epoch: 6 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:45:52,657-Speed 3269.35 samples/sec Loss 5.8378 LearningRate 0.0451 Epoch: 6 Global Step: 81610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:55,718-Speed 3346.63 samples/sec Loss 5.8337 LearningRate 0.0451 Epoch: 6 Global Step: 81620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:45:58,871-Speed 3249.37 samples/sec Loss 5.7351 LearningRate 0.0451 Epoch: 6 Global Step: 81630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:02,017-Speed 3254.99 samples/sec Loss 5.7938 LearningRate 0.0451 Epoch: 6 Global Step: 81640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:05,158-Speed 3261.87 samples/sec Loss 5.7414 LearningRate 0.0451 Epoch: 6 Global Step: 81650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:08,271-Speed 3290.93 samples/sec Loss 5.7307 LearningRate 0.0451 Epoch: 6 Global Step: 81660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:11,346-Speed 3330.48 samples/sec Loss 5.8595 LearningRate 0.0451 Epoch: 6 Global Step: 81670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:14,454-Speed 3295.62 samples/sec Loss 5.7524 LearningRate 0.0451 Epoch: 6 Global Step: 81680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:17,540-Speed 3320.05 samples/sec Loss 5.7594 LearningRate 0.0450 Epoch: 6 Global Step: 81690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:20,591-Speed 3357.06 samples/sec Loss 5.8660 LearningRate 0.0450 Epoch: 6 Global Step: 81700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:23,674-Speed 3322.52 samples/sec Loss 5.8181 LearningRate 0.0450 Epoch: 6 Global Step: 81710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:46:26,793-Speed 3283.78 samples/sec Loss 5.7695 LearningRate 0.0450 Epoch: 6 Global Step: 81720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:29,936-Speed 3258.69 samples/sec Loss 5.7786 LearningRate 0.0450 Epoch: 6 Global Step: 81730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:33,037-Speed 3303.80 samples/sec Loss 5.8558 LearningRate 0.0450 Epoch: 6 Global Step: 81740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:36,115-Speed 3327.99 samples/sec Loss 5.8015 LearningRate 0.0450 Epoch: 6 Global Step: 81750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:39,207-Speed 3312.70 samples/sec Loss 5.7903 LearningRate 0.0450 Epoch: 6 Global Step: 81760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:42,321-Speed 3289.45 samples/sec Loss 5.7643 LearningRate 0.0450 Epoch: 6 Global Step: 81770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:45,397-Speed 3329.08 samples/sec Loss 5.9200 LearningRate 0.0450 Epoch: 6 Global Step: 81780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:48,485-Speed 3317.38 samples/sec Loss 5.7750 LearningRate 0.0450 Epoch: 6 Global Step: 81790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:51,591-Speed 3298.00 samples/sec Loss 5.8696 LearningRate 0.0450 Epoch: 6 Global Step: 81800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:54,656-Speed 3341.57 samples/sec Loss 5.8000 LearningRate 0.0450 Epoch: 6 Global Step: 81810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:46:57,706-Speed 3359.50 samples/sec Loss 5.7008 LearningRate 0.0450 Epoch: 6 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:47:00,771-Speed 3341.78 samples/sec Loss 5.8897 LearningRate 0.0450 Epoch: 6 Global Step: 81830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:47:03,916-Speed 3256.89 samples/sec Loss 5.7808 LearningRate 0.0450 Epoch: 6 Global Step: 81840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:47:07,040-Speed 3279.15 samples/sec Loss 5.8727 LearningRate 0.0450 Epoch: 6 Global Step: 81850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:47:10,114-Speed 3332.58 samples/sec Loss 5.6585 LearningRate 0.0450 Epoch: 6 Global Step: 81860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:47:13,215-Speed 3302.99 samples/sec Loss 5.7344 LearningRate 0.0449 Epoch: 6 Global Step: 81870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:47:16,274-Speed 3348.75 samples/sec Loss 5.8298 LearningRate 0.0449 Epoch: 6 Global Step: 81880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:47:19,377-Speed 3300.62 samples/sec Loss 5.8251 LearningRate 0.0449 Epoch: 6 Global Step: 81890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:47:22,482-Speed 3299.62 samples/sec Loss 5.8413 LearningRate 0.0449 Epoch: 6 Global Step: 81900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:47:25,564-Speed 3323.68 samples/sec Loss 5.8083 LearningRate 0.0449 Epoch: 6 Global Step: 81910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:47:28,760-Speed 3204.38 samples/sec Loss 5.8537 LearningRate 0.0449 Epoch: 6 Global Step: 81920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:47:31,850-Speed 3314.96 samples/sec Loss 5.8991 LearningRate 0.0449 Epoch: 6 Global Step: 81930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:47:34,956-Speed 3298.88 samples/sec Loss 5.9025 LearningRate 0.0449 Epoch: 6 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:47:38,103-Speed 3254.84 samples/sec Loss 5.7021 LearningRate 0.0449 Epoch: 6 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:47:41,255-Speed 3249.73 samples/sec Loss 5.6870 LearningRate 0.0449 Epoch: 6 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:47:44,393-Speed 3263.38 samples/sec Loss 5.8189 LearningRate 0.0449 Epoch: 6 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:47:47,484-Speed 3315.20 samples/sec Loss 5.7964 LearningRate 0.0449 Epoch: 6 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:47:50,552-Speed 3338.48 samples/sec Loss 5.8714 LearningRate 0.0449 Epoch: 6 Global Step: 81990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:47:53,655-Speed 3301.35 samples/sec Loss 5.7357 LearningRate 0.0449 Epoch: 6 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:47:56,749-Speed 3310.64 samples/sec Loss 5.7511 LearningRate 0.0449 Epoch: 6 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:47:59,835-Speed 3318.78 samples/sec Loss 5.7761 LearningRate 0.0449 Epoch: 6 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:48:02,926-Speed 3313.82 samples/sec Loss 5.8361 LearningRate 0.0449 Epoch: 6 Global Step: 82030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 08:48:06,064-Speed 3264.50 samples/sec Loss 5.7670 LearningRate 0.0449 Epoch: 6 Global Step: 82040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 08:48:09,121-Speed 3350.68 samples/sec Loss 5.7881 LearningRate 0.0449 Epoch: 6 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:48:12,244-Speed 3279.94 samples/sec Loss 5.8322 LearningRate 0.0448 Epoch: 6 Global Step: 82060 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:15,391-Speed 3255.10 samples/sec Loss 5.7794 LearningRate 0.0448 Epoch: 6 Global Step: 82070 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:18,627-Speed 3165.03 samples/sec Loss 5.8126 LearningRate 0.0448 Epoch: 6 Global Step: 82080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:21,700-Speed 3333.44 samples/sec Loss 5.7671 LearningRate 0.0448 Epoch: 6 Global Step: 82090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:24,822-Speed 3281.17 samples/sec Loss 5.8004 LearningRate 0.0448 Epoch: 6 Global Step: 82100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:27,886-Speed 3343.05 samples/sec Loss 5.8931 LearningRate 0.0448 Epoch: 6 Global Step: 82110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:30,962-Speed 3330.25 samples/sec Loss 5.8016 LearningRate 0.0448 Epoch: 6 Global Step: 82120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:34,032-Speed 3336.35 samples/sec Loss 5.7691 LearningRate 0.0448 Epoch: 6 Global Step: 82130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:37,096-Speed 3342.70 samples/sec Loss 5.9062 LearningRate 0.0448 Epoch: 6 Global Step: 82140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:40,170-Speed 3332.47 samples/sec Loss 5.7576 LearningRate 0.0448 Epoch: 6 Global Step: 82150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:43,251-Speed 3324.63 samples/sec Loss 5.8247 LearningRate 0.0448 Epoch: 6 Global Step: 82160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:48:46,328-Speed 3328.95 samples/sec Loss 5.7121 LearningRate 0.0448 Epoch: 6 Global Step: 82170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:48:49,412-Speed 3321.85 samples/sec Loss 5.7580 LearningRate 0.0448 Epoch: 6 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:48:52,538-Speed 3276.21 samples/sec Loss 5.7383 LearningRate 0.0448 Epoch: 6 Global Step: 82190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:55,592-Speed 3354.36 samples/sec Loss 5.8347 LearningRate 0.0448 Epoch: 6 Global Step: 82200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:48:58,677-Speed 3319.97 samples/sec Loss 5.7671 LearningRate 0.0448 Epoch: 6 Global Step: 82210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:01,849-Speed 3229.25 samples/sec Loss 5.8020 LearningRate 0.0448 Epoch: 6 Global Step: 82220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:04,981-Speed 3270.43 samples/sec Loss 5.7119 LearningRate 0.0448 Epoch: 6 Global Step: 82230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:08,167-Speed 3215.24 samples/sec Loss 5.8917 LearningRate 0.0447 Epoch: 6 Global Step: 82240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:11,278-Speed 3293.10 samples/sec Loss 5.7949 LearningRate 0.0447 Epoch: 6 Global Step: 82250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:14,372-Speed 3310.70 samples/sec Loss 5.7877 LearningRate 0.0447 Epoch: 6 Global Step: 82260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:17,580-Speed 3193.40 samples/sec Loss 5.7368 LearningRate 0.0447 Epoch: 6 Global Step: 82270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:20,683-Speed 3301.21 samples/sec Loss 5.9164 LearningRate 0.0447 Epoch: 6 Global Step: 82280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:23,766-Speed 3321.94 samples/sec Loss 5.9065 LearningRate 0.0447 Epoch: 6 Global Step: 82290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:26,861-Speed 3309.87 samples/sec Loss 5.7811 LearningRate 0.0447 Epoch: 6 Global Step: 82300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:29,960-Speed 3305.12 samples/sec Loss 5.8515 LearningRate 0.0447 Epoch: 6 Global Step: 82310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:33,026-Speed 3340.30 samples/sec Loss 5.7958 LearningRate 0.0447 Epoch: 6 Global Step: 82320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:36,252-Speed 3175.72 samples/sec Loss 5.7881 LearningRate 0.0447 Epoch: 6 Global Step: 82330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:39,388-Speed 3266.21 samples/sec Loss 5.8809 LearningRate 0.0447 Epoch: 6 Global Step: 82340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:42,510-Speed 3281.01 samples/sec Loss 5.7645 LearningRate 0.0447 Epoch: 6 Global Step: 82350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:45,582-Speed 3334.59 samples/sec Loss 5.8593 LearningRate 0.0447 Epoch: 6 Global Step: 82360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:48,760-Speed 3223.42 samples/sec Loss 5.6844 LearningRate 0.0447 Epoch: 6 Global Step: 82370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:51,859-Speed 3304.66 samples/sec Loss 5.8954 LearningRate 0.0447 Epoch: 6 Global Step: 82380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:49:54,946-Speed 3318.00 samples/sec Loss 5.7286 LearningRate 0.0447 Epoch: 6 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:49:58,036-Speed 3315.77 samples/sec Loss 5.8767 LearningRate 0.0447 Epoch: 6 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:50:01,169-Speed 3270.51 samples/sec Loss 5.8322 LearningRate 0.0447 Epoch: 6 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:50:04,316-Speed 3255.46 samples/sec Loss 5.7163 LearningRate 0.0447 Epoch: 6 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:50:07,427-Speed 3292.11 samples/sec Loss 5.6920 LearningRate 0.0446 Epoch: 6 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:50:10,486-Speed 3349.15 samples/sec Loss 5.7734 LearningRate 0.0446 Epoch: 6 Global Step: 82440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:50:13,591-Speed 3298.14 samples/sec Loss 5.7343 LearningRate 0.0446 Epoch: 6 Global Step: 82450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:50:16,737-Speed 3256.82 samples/sec Loss 5.7171 LearningRate 0.0446 Epoch: 6 Global Step: 82460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:50:19,909-Speed 3228.82 samples/sec Loss 5.8513 LearningRate 0.0446 Epoch: 6 Global Step: 82470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:50:23,033-Speed 3279.00 samples/sec Loss 5.7224 LearningRate 0.0446 Epoch: 6 Global Step: 82480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:50:26,167-Speed 3268.10 samples/sec Loss 5.8443 LearningRate 0.0446 Epoch: 6 Global Step: 82490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:50:29,241-Speed 3332.22 samples/sec Loss 5.7516 LearningRate 0.0446 Epoch: 6 Global Step: 82500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:50:32,313-Speed 3333.64 samples/sec Loss 5.8361 LearningRate 0.0446 Epoch: 6 Global Step: 82510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:50:35,551-Speed 3164.40 samples/sec Loss 5.9624 LearningRate 0.0446 Epoch: 6 Global Step: 82520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:50:38,698-Speed 3254.24 samples/sec Loss 5.8449 LearningRate 0.0446 Epoch: 6 Global Step: 82530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:50:41,868-Speed 3232.16 samples/sec Loss 5.8964 LearningRate 0.0446 Epoch: 6 Global Step: 82540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:50:44,979-Speed 3292.43 samples/sec Loss 5.9307 LearningRate 0.0446 Epoch: 6 Global Step: 82550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:50:48,105-Speed 3276.99 samples/sec Loss 5.8318 LearningRate 0.0446 Epoch: 6 Global Step: 82560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:50:51,232-Speed 3274.78 samples/sec Loss 5.7240 LearningRate 0.0446 Epoch: 6 Global Step: 82570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:50:54,380-Speed 3254.33 samples/sec Loss 5.7614 LearningRate 0.0446 Epoch: 6 Global Step: 82580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:50:57,462-Speed 3323.57 samples/sec Loss 5.7721 LearningRate 0.0446 Epoch: 6 Global Step: 82590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:51:00,554-Speed 3314.27 samples/sec Loss 5.7651 LearningRate 0.0446 Epoch: 6 Global Step: 82600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:51:03,778-Speed 3176.86 samples/sec Loss 5.7643 LearningRate 0.0446 Epoch: 6 Global Step: 82610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:51:06,883-Speed 3298.93 samples/sec Loss 5.8287 LearningRate 0.0445 Epoch: 6 Global Step: 82620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:51:09,955-Speed 3334.74 samples/sec Loss 5.8110 LearningRate 0.0445 Epoch: 6 Global Step: 82630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:51:13,046-Speed 3313.68 samples/sec Loss 5.7547 LearningRate 0.0445 Epoch: 6 Global Step: 82640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:51:16,166-Speed 3283.19 samples/sec Loss 5.7328 LearningRate 0.0445 Epoch: 6 Global Step: 82650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:51:19,231-Speed 3342.32 samples/sec Loss 5.6667 LearningRate 0.0445 Epoch: 6 Global Step: 82660 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:51:22,305-Speed 3332.45 samples/sec Loss 5.6682 LearningRate 0.0445 Epoch: 6 Global Step: 82670 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:51:25,388-Speed 3321.86 samples/sec Loss 5.8565 LearningRate 0.0445 Epoch: 6 Global Step: 82680 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:51:28,579-Speed 3210.45 samples/sec Loss 5.8066 LearningRate 0.0445 Epoch: 6 Global Step: 82690 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:51:31,713-Speed 3268.42 samples/sec Loss 5.8188 LearningRate 0.0445 Epoch: 6 Global Step: 82700 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:51:34,804-Speed 3314.15 samples/sec Loss 5.8442 LearningRate 0.0445 Epoch: 6 Global Step: 82710 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:51:37,906-Speed 3301.81 samples/sec Loss 5.8001 LearningRate 0.0445 Epoch: 6 Global Step: 82720 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:51:41,007-Speed 3304.06 samples/sec Loss 5.8538 LearningRate 0.0445 Epoch: 6 Global Step: 82730 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:51:44,084-Speed 3328.00 samples/sec Loss 5.7473 LearningRate 0.0445 Epoch: 6 Global Step: 82740 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:51:47,208-Speed 3279.89 samples/sec Loss 5.7700 LearningRate 0.0445 Epoch: 6 Global Step: 82750 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:51:50,308-Speed 3304.12 samples/sec Loss 5.8124 LearningRate 0.0445 Epoch: 6 Global Step: 82760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:51:53,392-Speed 3321.40 samples/sec Loss 5.7176 LearningRate 0.0445 Epoch: 6 Global Step: 82770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:51:56,485-Speed 3310.95 samples/sec Loss 5.8333 LearningRate 0.0445 Epoch: 6 Global Step: 82780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:51:59,551-Speed 3341.09 samples/sec Loss 5.8673 LearningRate 0.0445 Epoch: 6 Global Step: 82790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:02,659-Speed 3296.02 samples/sec Loss 5.7572 LearningRate 0.0444 Epoch: 6 Global Step: 82800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:05,785-Speed 3276.48 samples/sec Loss 5.8611 LearningRate 0.0444 Epoch: 6 Global Step: 82810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:08,840-Speed 3353.20 samples/sec Loss 5.7669 LearningRate 0.0444 Epoch: 6 Global Step: 82820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:11,972-Speed 3270.05 samples/sec Loss 5.7432 LearningRate 0.0444 Epoch: 6 Global Step: 82830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:15,053-Speed 3325.65 samples/sec Loss 5.6576 LearningRate 0.0444 Epoch: 6 Global Step: 82840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:18,148-Speed 3309.58 samples/sec Loss 5.8123 LearningRate 0.0444 Epoch: 6 Global Step: 82850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:21,225-Speed 3328.63 samples/sec Loss 5.8666 LearningRate 0.0444 Epoch: 6 Global Step: 82860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:52:24,329-Speed 3299.84 samples/sec Loss 5.7867 LearningRate 0.0444 Epoch: 6 Global Step: 82870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:27,425-Speed 3308.23 samples/sec Loss 5.7703 LearningRate 0.0444 Epoch: 6 Global Step: 82880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:30,584-Speed 3242.68 samples/sec Loss 5.7853 LearningRate 0.0444 Epoch: 6 Global Step: 82890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:33,640-Speed 3352.10 samples/sec Loss 5.7607 LearningRate 0.0444 Epoch: 6 Global Step: 82900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:36,779-Speed 3263.62 samples/sec Loss 5.7877 LearningRate 0.0444 Epoch: 6 Global Step: 82910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:39,893-Speed 3289.39 samples/sec Loss 5.7563 LearningRate 0.0444 Epoch: 6 Global Step: 82920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:42,988-Speed 3309.34 samples/sec Loss 5.8244 LearningRate 0.0444 Epoch: 6 Global Step: 82930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:46,041-Speed 3355.93 samples/sec Loss 5.9285 LearningRate 0.0444 Epoch: 6 Global Step: 82940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:49,164-Speed 3279.59 samples/sec Loss 5.8955 LearningRate 0.0444 Epoch: 6 Global Step: 82950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:52,253-Speed 3315.54 samples/sec Loss 5.9225 LearningRate 0.0444 Epoch: 6 Global Step: 82960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:52:55,384-Speed 3272.22 samples/sec Loss 5.8825 LearningRate 0.0444 Epoch: 6 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:52:58,446-Speed 3344.69 samples/sec Loss 5.7701 LearningRate 0.0444 Epoch: 6 Global Step: 82980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:53:01,508-Speed 3346.28 samples/sec Loss 5.8484 LearningRate 0.0443 Epoch: 6 Global Step: 82990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:53:04,588-Speed 3325.25 samples/sec Loss 5.6881 LearningRate 0.0443 Epoch: 6 Global Step: 83000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:53:07,667-Speed 3326.71 samples/sec Loss 5.7804 LearningRate 0.0443 Epoch: 6 Global Step: 83010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:53:10,728-Speed 3346.86 samples/sec Loss 5.9176 LearningRate 0.0443 Epoch: 6 Global Step: 83020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:53:13,843-Speed 3287.82 samples/sec Loss 5.7801 LearningRate 0.0443 Epoch: 6 Global Step: 83030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:53:16,924-Speed 3324.67 samples/sec Loss 5.7476 LearningRate 0.0443 Epoch: 6 Global Step: 83040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:53:19,997-Speed 3333.62 samples/sec Loss 5.9139 LearningRate 0.0443 Epoch: 6 Global Step: 83050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:53:23,098-Speed 3303.30 samples/sec Loss 5.8263 LearningRate 0.0443 Epoch: 6 Global Step: 83060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:53:26,190-Speed 3312.26 samples/sec Loss 5.8725 LearningRate 0.0443 Epoch: 6 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:53:29,239-Speed 3359.69 samples/sec Loss 5.8580 LearningRate 0.0443 Epoch: 6 Global Step: 83080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:53:32,341-Speed 3302.19 samples/sec Loss 5.7704 LearningRate 0.0443 Epoch: 6 Global Step: 83090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:53:35,418-Speed 3328.88 samples/sec Loss 5.7885 LearningRate 0.0443 Epoch: 6 Global Step: 83100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:53:38,500-Speed 3323.47 samples/sec Loss 5.7725 LearningRate 0.0443 Epoch: 6 Global Step: 83110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:53:41,586-Speed 3319.51 samples/sec Loss 5.8135 LearningRate 0.0443 Epoch: 6 Global Step: 83120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:53:44,656-Speed 3336.53 samples/sec Loss 5.8773 LearningRate 0.0443 Epoch: 6 Global Step: 83130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:53:47,735-Speed 3327.55 samples/sec Loss 5.8136 LearningRate 0.0443 Epoch: 6 Global Step: 83140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:53:50,830-Speed 3309.18 samples/sec Loss 5.7611 LearningRate 0.0443 Epoch: 6 Global Step: 83150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:53:53,889-Speed 3347.84 samples/sec Loss 5.8632 LearningRate 0.0443 Epoch: 6 Global Step: 83160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:53:56,938-Speed 3360.00 samples/sec Loss 5.8049 LearningRate 0.0442 Epoch: 6 Global Step: 83170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:54:00,041-Speed 3300.74 samples/sec Loss 5.8250 LearningRate 0.0442 Epoch: 6 Global Step: 83180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:54:03,186-Speed 3257.30 samples/sec Loss 5.7209 LearningRate 0.0442 Epoch: 6 Global Step: 83190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:54:06,334-Speed 3253.88 samples/sec Loss 5.8464 LearningRate 0.0442 Epoch: 6 Global Step: 83200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:54:09,389-Speed 3353.57 samples/sec Loss 5.7928 LearningRate 0.0442 Epoch: 6 Global Step: 83210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:54:12,464-Speed 3331.04 samples/sec Loss 5.8880 LearningRate 0.0442 Epoch: 6 Global Step: 83220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:54:15,523-Speed 3348.92 samples/sec Loss 5.8237 LearningRate 0.0442 Epoch: 6 Global Step: 83230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:54:18,588-Speed 3342.20 samples/sec Loss 5.8016 LearningRate 0.0442 Epoch: 6 Global Step: 83240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:54:21,639-Speed 3357.13 samples/sec Loss 5.7313 LearningRate 0.0442 Epoch: 6 Global Step: 83250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:54:24,729-Speed 3314.11 samples/sec Loss 5.8565 LearningRate 0.0442 Epoch: 6 Global Step: 83260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:54:27,906-Speed 3224.79 samples/sec Loss 5.8078 LearningRate 0.0442 Epoch: 6 Global Step: 83270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:54:31,059-Speed 3249.20 samples/sec Loss 5.7827 LearningRate 0.0442 Epoch: 6 Global Step: 83280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:54:34,126-Speed 3339.62 samples/sec Loss 5.7548 LearningRate 0.0442 Epoch: 6 Global Step: 83290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:54:37,209-Speed 3322.39 samples/sec Loss 5.7397 LearningRate 0.0442 Epoch: 6 Global Step: 83300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:54:40,289-Speed 3325.93 samples/sec Loss 5.7906 LearningRate 0.0442 Epoch: 6 Global Step: 83310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:54:43,413-Speed 3278.61 samples/sec Loss 5.7283 LearningRate 0.0442 Epoch: 6 Global Step: 83320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:54:46,482-Speed 3337.99 samples/sec Loss 5.8438 LearningRate 0.0442 Epoch: 6 Global Step: 83330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:54:49,561-Speed 3326.23 samples/sec Loss 5.7938 LearningRate 0.0442 Epoch: 6 Global Step: 83340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:54:52,636-Speed 3331.00 samples/sec Loss 5.8657 LearningRate 0.0442 Epoch: 6 Global Step: 83350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:54:55,721-Speed 3321.16 samples/sec Loss 5.8138 LearningRate 0.0441 Epoch: 6 Global Step: 83360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:54:58,876-Speed 3246.46 samples/sec Loss 5.7743 LearningRate 0.0441 Epoch: 6 Global Step: 83370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:55:02,022-Speed 3256.65 samples/sec Loss 5.7284 LearningRate 0.0441 Epoch: 6 Global Step: 83380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:55:05,065-Speed 3364.95 samples/sec Loss 5.6770 LearningRate 0.0441 Epoch: 6 Global Step: 83390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:55:08,177-Speed 3291.94 samples/sec Loss 5.7993 LearningRate 0.0441 Epoch: 6 Global Step: 83400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:55:11,257-Speed 3325.90 samples/sec Loss 5.8013 LearningRate 0.0441 Epoch: 6 Global Step: 83410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:55:14,409-Speed 3249.52 samples/sec Loss 5.9635 LearningRate 0.0441 Epoch: 6 Global Step: 83420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:55:17,518-Speed 3295.31 samples/sec Loss 5.7764 LearningRate 0.0441 Epoch: 6 Global Step: 83430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:55:20,591-Speed 3333.46 samples/sec Loss 5.7210 LearningRate 0.0441 Epoch: 6 Global Step: 83440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:55:23,674-Speed 3322.03 samples/sec Loss 5.8915 LearningRate 0.0441 Epoch: 6 Global Step: 83450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:55:26,734-Speed 3348.32 samples/sec Loss 5.9030 LearningRate 0.0441 Epoch: 6 Global Step: 83460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:55:29,835-Speed 3303.30 samples/sec Loss 5.7949 LearningRate 0.0441 Epoch: 6 Global Step: 83470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:55:32,951-Speed 3286.52 samples/sec Loss 5.8992 LearningRate 0.0441 Epoch: 6 Global Step: 83480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:55:36,066-Speed 3288.52 samples/sec Loss 5.9511 LearningRate 0.0441 Epoch: 6 Global Step: 83490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:55:39,226-Speed 3242.04 samples/sec Loss 5.7530 LearningRate 0.0441 Epoch: 6 Global Step: 83500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:55:42,301-Speed 3331.37 samples/sec Loss 5.7686 LearningRate 0.0441 Epoch: 6 Global Step: 83510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:55:45,399-Speed 3306.42 samples/sec Loss 5.7075 LearningRate 0.0441 Epoch: 6 Global Step: 83520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:55:48,479-Speed 3325.65 samples/sec Loss 5.7806 LearningRate 0.0441 Epoch: 6 Global Step: 83530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:55:51,580-Speed 3303.77 samples/sec Loss 5.7624 LearningRate 0.0441 Epoch: 6 Global Step: 83540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:55:54,714-Speed 3267.41 samples/sec Loss 5.7895 LearningRate 0.0440 Epoch: 6 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:55:57,793-Speed 3328.15 samples/sec Loss 5.8728 LearningRate 0.0440 Epoch: 6 Global Step: 83560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:56:00,888-Speed 3308.87 samples/sec Loss 5.8349 LearningRate 0.0440 Epoch: 6 Global Step: 83570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:56:04,019-Speed 3271.33 samples/sec Loss 5.7528 LearningRate 0.0440 Epoch: 6 Global Step: 83580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:56:07,185-Speed 3235.75 samples/sec Loss 5.8117 LearningRate 0.0440 Epoch: 6 Global Step: 83590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:56:10,299-Speed 3289.49 samples/sec Loss 5.7194 LearningRate 0.0440 Epoch: 6 Global Step: 83600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:56:13,467-Speed 3233.25 samples/sec Loss 5.8925 LearningRate 0.0440 Epoch: 6 Global Step: 83610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:56:16,575-Speed 3295.77 samples/sec Loss 5.7788 LearningRate 0.0440 Epoch: 6 Global Step: 83620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:56:19,742-Speed 3234.36 samples/sec Loss 5.7723 LearningRate 0.0440 Epoch: 6 Global Step: 83630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:56:22,849-Speed 3297.02 samples/sec Loss 5.7668 LearningRate 0.0440 Epoch: 6 Global Step: 83640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:56:25,932-Speed 3322.99 samples/sec Loss 5.8339 LearningRate 0.0440 Epoch: 6 Global Step: 83650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:56:29,043-Speed 3292.19 samples/sec Loss 5.7182 LearningRate 0.0440 Epoch: 6 Global Step: 83660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:56:32,104-Speed 3345.83 samples/sec Loss 5.7378 LearningRate 0.0440 Epoch: 6 Global Step: 83670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:56:35,237-Speed 3270.04 samples/sec Loss 5.8392 LearningRate 0.0440 Epoch: 6 Global Step: 83680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:56:38,355-Speed 3285.49 samples/sec Loss 5.6985 LearningRate 0.0440 Epoch: 6 Global Step: 83690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:56:41,557-Speed 3198.58 samples/sec Loss 5.6714 LearningRate 0.0440 Epoch: 6 Global Step: 83700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:56:44,649-Speed 3312.57 samples/sec Loss 5.7844 LearningRate 0.0440 Epoch: 6 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:56:47,797-Speed 3253.93 samples/sec Loss 5.7766 LearningRate 0.0440 Epoch: 6 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:56:50,938-Speed 3261.20 samples/sec Loss 5.7215 LearningRate 0.0440 Epoch: 6 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:56:54,084-Speed 3256.07 samples/sec Loss 5.6461 LearningRate 0.0439 Epoch: 6 Global Step: 83740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:56:57,141-Speed 3350.52 samples/sec Loss 5.7781 LearningRate 0.0439 Epoch: 6 Global Step: 83750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:57:00,260-Speed 3284.93 samples/sec Loss 5.8634 LearningRate 0.0439 Epoch: 6 Global Step: 83760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:57:03,310-Speed 3357.39 samples/sec Loss 5.8706 LearningRate 0.0439 Epoch: 6 Global Step: 83770 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:57:06,443-Speed 3269.65 samples/sec Loss 5.7834 LearningRate 0.0439 Epoch: 6 Global Step: 83780 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:57:09,552-Speed 3295.56 samples/sec Loss 5.8308 LearningRate 0.0439 Epoch: 6 Global Step: 83790 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:57:12,689-Speed 3265.21 samples/sec Loss 5.7909 LearningRate 0.0439 Epoch: 6 Global Step: 83800 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:57:15,843-Speed 3247.14 samples/sec Loss 5.8773 LearningRate 0.0439 Epoch: 6 Global Step: 83810 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:57:18,988-Speed 3257.15 samples/sec Loss 5.8300 LearningRate 0.0439 Epoch: 6 Global Step: 83820 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:57:22,083-Speed 3309.88 samples/sec Loss 5.7091 LearningRate 0.0439 Epoch: 6 Global Step: 83830 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:57:25,227-Speed 3257.28 samples/sec Loss 5.8399 LearningRate 0.0439 Epoch: 6 Global Step: 83840 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:57:28,316-Speed 3317.09 samples/sec Loss 5.7575 LearningRate 0.0439 Epoch: 6 Global Step: 83850 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:57:31,421-Speed 3298.96 samples/sec Loss 5.8692 LearningRate 0.0439 Epoch: 6 Global Step: 83860 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:57:34,531-Speed 3293.43 samples/sec Loss 5.7550 LearningRate 0.0439 Epoch: 6 Global Step: 83870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:57:37,662-Speed 3270.93 samples/sec Loss 5.9253 LearningRate 0.0439 Epoch: 6 Global Step: 83880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:57:40,798-Speed 3266.80 samples/sec Loss 5.8084 LearningRate 0.0439 Epoch: 6 Global Step: 83890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:57:43,944-Speed 3255.18 samples/sec Loss 5.9204 LearningRate 0.0439 Epoch: 6 Global Step: 83900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:57:47,093-Speed 3253.12 samples/sec Loss 5.8166 LearningRate 0.0439 Epoch: 6 Global Step: 83910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:57:50,195-Speed 3303.01 samples/sec Loss 5.8416 LearningRate 0.0438 Epoch: 6 Global Step: 83920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:57:53,298-Speed 3300.67 samples/sec Loss 5.7582 LearningRate 0.0438 Epoch: 6 Global Step: 83930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:57:56,356-Speed 3349.85 samples/sec Loss 5.7940 LearningRate 0.0438 Epoch: 6 Global Step: 83940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:57:59,458-Speed 3302.07 samples/sec Loss 5.8760 LearningRate 0.0438 Epoch: 6 Global Step: 83950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:58:02,613-Speed 3247.15 samples/sec Loss 5.7531 LearningRate 0.0438 Epoch: 6 Global Step: 83960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:58:05,742-Speed 3273.65 samples/sec Loss 5.7701 LearningRate 0.0438 Epoch: 6 Global Step: 83970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:58:08,801-Speed 3347.99 samples/sec Loss 5.7988 LearningRate 0.0438 Epoch: 6 Global Step: 83980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:58:11,861-Speed 3347.49 samples/sec Loss 5.6795 LearningRate 0.0438 Epoch: 6 Global Step: 83990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:58:14,995-Speed 3268.46 samples/sec Loss 5.7655 LearningRate 0.0438 Epoch: 6 Global Step: 84000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:58:18,051-Speed 3352.19 samples/sec Loss 5.8277 LearningRate 0.0438 Epoch: 6 Global Step: 84010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:58:21,110-Speed 3348.35 samples/sec Loss 5.6452 LearningRate 0.0438 Epoch: 6 Global Step: 84020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:58:24,194-Speed 3321.86 samples/sec Loss 5.8807 LearningRate 0.0438 Epoch: 6 Global Step: 84030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:58:27,294-Speed 3304.57 samples/sec Loss 5.8717 LearningRate 0.0438 Epoch: 6 Global Step: 84040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:58:30,381-Speed 3317.25 samples/sec Loss 5.8280 LearningRate 0.0438 Epoch: 6 Global Step: 84050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:58:33,436-Speed 3353.49 samples/sec Loss 5.6650 LearningRate 0.0438 Epoch: 6 Global Step: 84060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 08:58:36,507-Speed 3335.56 samples/sec Loss 5.8289 LearningRate 0.0438 Epoch: 6 Global Step: 84070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 08:58:39,578-Speed 3335.38 samples/sec Loss 5.7141 LearningRate 0.0438 Epoch: 6 Global Step: 84080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:58:42,671-Speed 3311.94 samples/sec Loss 5.8203 LearningRate 0.0438 Epoch: 6 Global Step: 84090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:58:45,721-Speed 3358.04 samples/sec Loss 5.7435 LearningRate 0.0438 Epoch: 6 Global Step: 84100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:58:48,815-Speed 3310.51 samples/sec Loss 5.7350 LearningRate 0.0437 Epoch: 6 Global Step: 84110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:58:51,972-Speed 3245.56 samples/sec Loss 5.8158 LearningRate 0.0437 Epoch: 6 Global Step: 84120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:58:55,102-Speed 3272.56 samples/sec Loss 5.8241 LearningRate 0.0437 Epoch: 6 Global Step: 84130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:58:58,200-Speed 3305.90 samples/sec Loss 5.8174 LearningRate 0.0437 Epoch: 6 Global Step: 84140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:59:01,336-Speed 3266.87 samples/sec Loss 5.7898 LearningRate 0.0437 Epoch: 6 Global Step: 84150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:59:04,476-Speed 3262.65 samples/sec Loss 5.8147 LearningRate 0.0437 Epoch: 6 Global Step: 84160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:59:07,541-Speed 3341.99 samples/sec Loss 5.7694 LearningRate 0.0437 Epoch: 6 Global Step: 84170 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:59:10,615-Speed 3332.05 samples/sec Loss 5.8402 LearningRate 0.0437 Epoch: 6 Global Step: 84180 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:59:13,713-Speed 3306.79 samples/sec Loss 5.8705 LearningRate 0.0437 Epoch: 6 Global Step: 84190 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:59:16,826-Speed 3290.31 samples/sec Loss 5.7875 LearningRate 0.0437 Epoch: 6 Global Step: 84200 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:59:19,919-Speed 3311.91 samples/sec Loss 5.8888 LearningRate 0.0437 Epoch: 6 Global Step: 84210 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:59:23,034-Speed 3288.38 samples/sec Loss 5.7449 LearningRate 0.0437 Epoch: 6 Global Step: 84220 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:59:26,098-Speed 3342.96 samples/sec Loss 5.7661 LearningRate 0.0437 Epoch: 6 Global Step: 84230 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:59:29,194-Speed 3308.39 samples/sec Loss 5.7415 LearningRate 0.0437 Epoch: 6 Global Step: 84240 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:59:32,243-Speed 3359.11 samples/sec Loss 5.8082 LearningRate 0.0437 Epoch: 6 Global Step: 84250 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:59:35,357-Speed 3289.72 samples/sec Loss 5.8275 LearningRate 0.0437 Epoch: 6 Global Step: 84260 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 08:59:38,437-Speed 3325.48 samples/sec Loss 5.8741 LearningRate 0.0437 Epoch: 6 Global Step: 84270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:59:41,543-Speed 3297.88 samples/sec Loss 5.7239 LearningRate 0.0437 Epoch: 6 Global Step: 84280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:59:44,607-Speed 3343.42 samples/sec Loss 5.7741 LearningRate 0.0437 Epoch: 6 Global Step: 84290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:59:47,705-Speed 3306.35 samples/sec Loss 5.7865 LearningRate 0.0436 Epoch: 6 Global Step: 84300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:59:50,788-Speed 3322.56 samples/sec Loss 5.7935 LearningRate 0.0436 Epoch: 6 Global Step: 84310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:59:53,918-Speed 3272.15 samples/sec Loss 5.7681 LearningRate 0.0436 Epoch: 6 Global Step: 84320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 08:59:56,992-Speed 3332.79 samples/sec Loss 5.7295 LearningRate 0.0436 Epoch: 6 Global Step: 84330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:00:00,079-Speed 3317.50 samples/sec Loss 5.7303 LearningRate 0.0436 Epoch: 6 Global Step: 84340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:00:03,149-Speed 3337.29 samples/sec Loss 5.8531 LearningRate 0.0436 Epoch: 6 Global Step: 84350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:00:06,279-Speed 3272.66 samples/sec Loss 5.8108 LearningRate 0.0436 Epoch: 6 Global Step: 84360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:00:09,338-Speed 3348.23 samples/sec Loss 5.8326 LearningRate 0.0436 Epoch: 6 Global Step: 84370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:00:12,431-Speed 3311.20 samples/sec Loss 5.7332 LearningRate 0.0436 Epoch: 6 Global Step: 84380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:00:15,520-Speed 3317.12 samples/sec Loss 5.8059 LearningRate 0.0436 Epoch: 6 Global Step: 84390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:00:18,666-Speed 3255.79 samples/sec Loss 5.7601 LearningRate 0.0436 Epoch: 6 Global Step: 84400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:00:21,732-Speed 3340.74 samples/sec Loss 5.8006 LearningRate 0.0436 Epoch: 6 Global Step: 84410 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:00:24,883-Speed 3250.26 samples/sec Loss 5.8132 LearningRate 0.0436 Epoch: 6 Global Step: 84420 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:00:27,988-Speed 3298.90 samples/sec Loss 5.7413 LearningRate 0.0436 Epoch: 6 Global Step: 84430 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:00:31,139-Speed 3251.28 samples/sec Loss 5.7650 LearningRate 0.0436 Epoch: 6 Global Step: 84440 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:00:34,214-Speed 3331.55 samples/sec Loss 5.7646 LearningRate 0.0436 Epoch: 6 Global Step: 84450 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:00:37,289-Speed 3330.90 samples/sec Loss 5.7923 LearningRate 0.0436 Epoch: 6 Global Step: 84460 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:00:40,393-Speed 3299.08 samples/sec Loss 5.8027 LearningRate 0.0436 Epoch: 6 Global Step: 84470 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:00:43,504-Speed 3293.17 samples/sec Loss 5.8188 LearningRate 0.0436 Epoch: 6 Global Step: 84480 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:00:46,568-Speed 3342.61 samples/sec Loss 5.9025 LearningRate 0.0435 Epoch: 6 Global Step: 84490 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:00:49,667-Speed 3305.85 samples/sec Loss 5.7080 LearningRate 0.0435 Epoch: 6 Global Step: 84500 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:00:52,798-Speed 3271.93 samples/sec Loss 5.7931 LearningRate 0.0435 Epoch: 6 Global Step: 84510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:00:55,893-Speed 3309.11 samples/sec Loss 5.7984 LearningRate 0.0435 Epoch: 6 Global Step: 84520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:00:58,972-Speed 3327.03 samples/sec Loss 5.8051 LearningRate 0.0435 Epoch: 6 Global Step: 84530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:02,068-Speed 3309.05 samples/sec Loss 5.7329 LearningRate 0.0435 Epoch: 6 Global Step: 84540 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:05,224-Speed 3244.88 samples/sec Loss 5.8304 LearningRate 0.0435 Epoch: 6 Global Step: 84550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:08,380-Speed 3246.55 samples/sec Loss 5.7915 LearningRate 0.0435 Epoch: 6 Global Step: 84560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:11,504-Speed 3277.67 samples/sec Loss 5.9059 LearningRate 0.0435 Epoch: 6 Global Step: 84570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:14,667-Speed 3239.14 samples/sec Loss 5.8427 LearningRate 0.0435 Epoch: 6 Global Step: 84580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:17,864-Speed 3203.32 samples/sec Loss 5.8106 LearningRate 0.0435 Epoch: 6 Global Step: 84590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:20,954-Speed 3315.35 samples/sec Loss 5.8160 LearningRate 0.0435 Epoch: 6 Global Step: 84600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:24,056-Speed 3302.87 samples/sec Loss 5.8309 LearningRate 0.0435 Epoch: 6 Global Step: 84610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:01:27,159-Speed 3300.83 samples/sec Loss 5.7796 LearningRate 0.0435 Epoch: 6 Global Step: 84620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:01:30,289-Speed 3271.97 samples/sec Loss 5.8418 LearningRate 0.0435 Epoch: 6 Global Step: 84630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:01:33,378-Speed 3316.49 samples/sec Loss 5.8425 LearningRate 0.0435 Epoch: 6 Global Step: 84640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:01:36,441-Speed 3344.62 samples/sec Loss 5.6664 LearningRate 0.0435 Epoch: 6 Global Step: 84650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:39,554-Speed 3289.67 samples/sec Loss 5.8098 LearningRate 0.0435 Epoch: 6 Global Step: 84660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:42,746-Speed 3209.47 samples/sec Loss 5.7854 LearningRate 0.0434 Epoch: 6 Global Step: 84670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:45,874-Speed 3274.26 samples/sec Loss 5.8737 LearningRate 0.0434 Epoch: 6 Global Step: 84680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:49,115-Speed 3160.75 samples/sec Loss 5.7188 LearningRate 0.0434 Epoch: 6 Global Step: 84690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:52,292-Speed 3224.98 samples/sec Loss 5.7974 LearningRate 0.0434 Epoch: 6 Global Step: 84700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:55,357-Speed 3341.19 samples/sec Loss 6.0032 LearningRate 0.0434 Epoch: 6 Global Step: 84710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:01:58,497-Speed 3261.96 samples/sec Loss 5.7544 LearningRate 0.0434 Epoch: 6 Global Step: 84720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:01,572-Speed 3331.37 samples/sec Loss 5.8361 LearningRate 0.0434 Epoch: 6 Global Step: 84730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:04,684-Speed 3291.15 samples/sec Loss 5.8464 LearningRate 0.0434 Epoch: 6 Global Step: 84740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:07,767-Speed 3322.35 samples/sec Loss 5.8191 LearningRate 0.0434 Epoch: 6 Global Step: 84750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:10,857-Speed 3315.28 samples/sec Loss 5.8211 LearningRate 0.0434 Epoch: 6 Global Step: 84760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:13,985-Speed 3275.25 samples/sec Loss 5.8850 LearningRate 0.0434 Epoch: 6 Global Step: 84770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:17,067-Speed 3323.46 samples/sec Loss 5.7636 LearningRate 0.0434 Epoch: 6 Global Step: 84780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:20,148-Speed 3324.76 samples/sec Loss 5.7469 LearningRate 0.0434 Epoch: 6 Global Step: 84790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:23,257-Speed 3294.43 samples/sec Loss 5.9097 LearningRate 0.0434 Epoch: 6 Global Step: 84800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:26,422-Speed 3236.87 samples/sec Loss 5.8461 LearningRate 0.0434 Epoch: 6 Global Step: 84810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:29,613-Speed 3209.75 samples/sec Loss 5.7350 LearningRate 0.0434 Epoch: 6 Global Step: 84820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:32,694-Speed 3324.64 samples/sec Loss 5.7499 LearningRate 0.0434 Epoch: 6 Global Step: 84830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:35,792-Speed 3306.46 samples/sec Loss 5.8704 LearningRate 0.0434 Epoch: 6 Global Step: 84840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:38,878-Speed 3319.49 samples/sec Loss 5.7963 LearningRate 0.0434 Epoch: 6 Global Step: 84850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:41,952-Speed 3332.14 samples/sec Loss 5.7123 LearningRate 0.0433 Epoch: 6 Global Step: 84860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:45,008-Speed 3352.12 samples/sec Loss 5.7193 LearningRate 0.0433 Epoch: 6 Global Step: 84870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:48,056-Speed 3360.19 samples/sec Loss 5.7746 LearningRate 0.0433 Epoch: 6 Global Step: 84880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:51,117-Speed 3347.33 samples/sec Loss 5.6684 LearningRate 0.0433 Epoch: 6 Global Step: 84890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:54,181-Speed 3341.87 samples/sec Loss 5.8083 LearningRate 0.0433 Epoch: 6 Global Step: 84900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:02:57,244-Speed 3345.01 samples/sec Loss 5.7144 LearningRate 0.0433 Epoch: 6 Global Step: 84910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:00,472-Speed 3173.18 samples/sec Loss 5.8357 LearningRate 0.0433 Epoch: 6 Global Step: 84920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:03,571-Speed 3305.57 samples/sec Loss 5.8139 LearningRate 0.0433 Epoch: 6 Global Step: 84930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:06,674-Speed 3300.84 samples/sec Loss 5.8220 LearningRate 0.0433 Epoch: 6 Global Step: 84940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:09,750-Speed 3329.67 samples/sec Loss 5.8235 LearningRate 0.0433 Epoch: 6 Global Step: 84950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:03:12,846-Speed 3309.15 samples/sec Loss 5.7966 LearningRate 0.0433 Epoch: 6 Global Step: 84960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:16,019-Speed 3227.78 samples/sec Loss 5.8138 LearningRate 0.0433 Epoch: 6 Global Step: 84970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:19,095-Speed 3330.46 samples/sec Loss 5.8608 LearningRate 0.0433 Epoch: 6 Global Step: 84980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:22,172-Speed 3329.52 samples/sec Loss 5.7496 LearningRate 0.0433 Epoch: 6 Global Step: 84990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:25,308-Speed 3265.89 samples/sec Loss 5.8313 LearningRate 0.0433 Epoch: 6 Global Step: 85000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:28,379-Speed 3336.03 samples/sec Loss 5.8731 LearningRate 0.0433 Epoch: 6 Global Step: 85010 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:31,487-Speed 3295.67 samples/sec Loss 5.7952 LearningRate 0.0433 Epoch: 6 Global Step: 85020 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:34,601-Speed 3289.07 samples/sec Loss 6.0219 LearningRate 0.0433 Epoch: 6 Global Step: 85030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:37,678-Speed 3329.52 samples/sec Loss 5.8690 LearningRate 0.0433 Epoch: 6 Global Step: 85040 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:40,769-Speed 3314.11 samples/sec Loss 5.7858 LearningRate 0.0432 Epoch: 6 Global Step: 85050 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:03:43,926-Speed 3243.43 samples/sec Loss 5.7127 LearningRate 0.0432 Epoch: 6 Global Step: 85060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:03:47,016-Speed 3315.53 samples/sec Loss 5.7701 LearningRate 0.0432 Epoch: 6 Global Step: 85070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:03:50,094-Speed 3328.19 samples/sec Loss 5.8091 LearningRate 0.0432 Epoch: 6 Global Step: 85080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:03:53,279-Speed 3215.99 samples/sec Loss 5.8460 LearningRate 0.0432 Epoch: 6 Global Step: 85090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:03:56,368-Speed 3316.22 samples/sec Loss 5.7803 LearningRate 0.0432 Epoch: 6 Global Step: 85100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:03:59,509-Speed 3261.10 samples/sec Loss 5.8213 LearningRate 0.0432 Epoch: 6 Global Step: 85110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:04:02,610-Speed 3302.83 samples/sec Loss 5.6804 LearningRate 0.0432 Epoch: 6 Global Step: 85120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:04:05,694-Speed 3322.02 samples/sec Loss 5.7576 LearningRate 0.0432 Epoch: 6 Global Step: 85130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:04:08,736-Speed 3367.57 samples/sec Loss 5.7169 LearningRate 0.0432 Epoch: 6 Global Step: 85140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:04:11,815-Speed 3325.95 samples/sec Loss 5.7983 LearningRate 0.0432 Epoch: 6 Global Step: 85150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:04:14,870-Speed 3353.86 samples/sec Loss 5.8733 LearningRate 0.0432 Epoch: 6 Global Step: 85160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:04:17,935-Speed 3341.95 samples/sec Loss 5.7756 LearningRate 0.0432 Epoch: 6 Global Step: 85170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:04:21,049-Speed 3289.06 samples/sec Loss 5.7351 LearningRate 0.0432 Epoch: 6 Global Step: 85180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:04:24,141-Speed 3312.65 samples/sec Loss 5.7875 LearningRate 0.0432 Epoch: 6 Global Step: 85190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:04:27,194-Speed 3355.66 samples/sec Loss 5.8062 LearningRate 0.0432 Epoch: 6 Global Step: 85200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:04:30,257-Speed 3343.58 samples/sec Loss 5.8174 LearningRate 0.0432 Epoch: 6 Global Step: 85210 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:04:33,346-Speed 3316.45 samples/sec Loss 5.7972 LearningRate 0.0432 Epoch: 6 Global Step: 85220 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:04:36,471-Speed 3278.76 samples/sec Loss 5.8480 LearningRate 0.0432 Epoch: 6 Global Step: 85230 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:04:39,539-Speed 3338.16 samples/sec Loss 5.8963 LearningRate 0.0431 Epoch: 6 Global Step: 85240 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:04:42,609-Speed 3336.48 samples/sec Loss 5.7139 LearningRate 0.0431 Epoch: 6 Global Step: 85250 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:04:45,663-Speed 3354.76 samples/sec Loss 5.8226 LearningRate 0.0431 Epoch: 6 Global Step: 85260 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:04:48,730-Speed 3338.92 samples/sec Loss 5.7165 LearningRate 0.0431 Epoch: 6 Global Step: 85270 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:04:51,930-Speed 3201.73 samples/sec Loss 5.7827 LearningRate 0.0431 Epoch: 6 Global Step: 85280 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:04:55,091-Speed 3240.19 samples/sec Loss 5.8278 LearningRate 0.0431 Epoch: 6 Global Step: 85290 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:04:58,158-Speed 3339.76 samples/sec Loss 5.7640 LearningRate 0.0431 Epoch: 6 Global Step: 85300 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:05:01,256-Speed 3306.31 samples/sec Loss 5.7826 LearningRate 0.0431 Epoch: 6 Global Step: 85310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:04,382-Speed 3277.00 samples/sec Loss 5.8890 LearningRate 0.0431 Epoch: 6 Global Step: 85320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:07,498-Speed 3286.90 samples/sec Loss 5.8777 LearningRate 0.0431 Epoch: 6 Global Step: 85330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:10,604-Speed 3297.67 samples/sec Loss 5.8205 LearningRate 0.0431 Epoch: 6 Global Step: 85340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:13,697-Speed 3312.09 samples/sec Loss 5.8285 LearningRate 0.0431 Epoch: 6 Global Step: 85350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:16,820-Speed 3279.40 samples/sec Loss 5.8091 LearningRate 0.0431 Epoch: 6 Global Step: 85360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:19,897-Speed 3329.48 samples/sec Loss 5.8178 LearningRate 0.0431 Epoch: 6 Global Step: 85370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:22,989-Speed 3312.91 samples/sec Loss 5.7080 LearningRate 0.0431 Epoch: 6 Global Step: 85380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:26,202-Speed 3188.24 samples/sec Loss 5.8235 LearningRate 0.0431 Epoch: 6 Global Step: 85390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:29,319-Speed 3285.22 samples/sec Loss 5.7639 LearningRate 0.0431 Epoch: 6 Global Step: 85400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:32,380-Speed 3346.96 samples/sec Loss 5.8234 LearningRate 0.0431 Epoch: 6 Global Step: 85410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:05:35,591-Speed 3189.60 samples/sec Loss 5.8091 LearningRate 0.0431 Epoch: 6 Global Step: 85420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:05:38,766-Speed 3226.39 samples/sec Loss 5.8956 LearningRate 0.0430 Epoch: 6 Global Step: 85430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:05:41,874-Speed 3296.18 samples/sec Loss 5.7911 LearningRate 0.0430 Epoch: 6 Global Step: 85440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:05:44,989-Speed 3288.34 samples/sec Loss 5.7324 LearningRate 0.0430 Epoch: 6 Global Step: 85450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:48,120-Speed 3271.71 samples/sec Loss 5.7606 LearningRate 0.0430 Epoch: 6 Global Step: 85460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:51,206-Speed 3318.84 samples/sec Loss 5.7450 LearningRate 0.0430 Epoch: 6 Global Step: 85470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:54,313-Speed 3297.44 samples/sec Loss 5.6776 LearningRate 0.0430 Epoch: 6 Global Step: 85480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:05:57,387-Speed 3332.41 samples/sec Loss 5.8006 LearningRate 0.0430 Epoch: 6 Global Step: 85490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:00,447-Speed 3347.93 samples/sec Loss 5.8140 LearningRate 0.0430 Epoch: 6 Global Step: 85500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:03,579-Speed 3269.84 samples/sec Loss 5.8496 LearningRate 0.0430 Epoch: 6 Global Step: 85510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:06,670-Speed 3313.91 samples/sec Loss 5.7143 LearningRate 0.0430 Epoch: 6 Global Step: 85520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:09,736-Speed 3341.07 samples/sec Loss 5.7985 LearningRate 0.0430 Epoch: 6 Global Step: 85530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:12,857-Speed 3282.03 samples/sec Loss 5.7167 LearningRate 0.0430 Epoch: 6 Global Step: 85540 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:15,976-Speed 3284.22 samples/sec Loss 5.7701 LearningRate 0.0430 Epoch: 6 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:06:19,118-Speed 3260.34 samples/sec Loss 5.7029 LearningRate 0.0430 Epoch: 6 Global Step: 85560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:06:22,221-Speed 3300.49 samples/sec Loss 5.8213 LearningRate 0.0430 Epoch: 6 Global Step: 85570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:06:25,343-Speed 3282.09 samples/sec Loss 5.8201 LearningRate 0.0430 Epoch: 6 Global Step: 85580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:06:28,499-Speed 3244.84 samples/sec Loss 5.7100 LearningRate 0.0430 Epoch: 6 Global Step: 85590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:06:31,578-Speed 3326.87 samples/sec Loss 5.7621 LearningRate 0.0430 Epoch: 6 Global Step: 85600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:06:34,633-Speed 3352.81 samples/sec Loss 5.8469 LearningRate 0.0430 Epoch: 6 Global Step: 85610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:37,737-Speed 3300.76 samples/sec Loss 5.8176 LearningRate 0.0429 Epoch: 6 Global Step: 85620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:40,848-Speed 3292.39 samples/sec Loss 5.8729 LearningRate 0.0429 Epoch: 6 Global Step: 85630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:43,921-Speed 3333.11 samples/sec Loss 5.6903 LearningRate 0.0429 Epoch: 6 Global Step: 85640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:47,062-Speed 3261.15 samples/sec Loss 5.9578 LearningRate 0.0429 Epoch: 6 Global Step: 85650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:50,220-Speed 3243.50 samples/sec Loss 5.7811 LearningRate 0.0429 Epoch: 6 Global Step: 85660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:53,388-Speed 3233.50 samples/sec Loss 5.8498 LearningRate 0.0429 Epoch: 6 Global Step: 85670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:56,481-Speed 3312.10 samples/sec Loss 5.7633 LearningRate 0.0429 Epoch: 6 Global Step: 85680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:06:59,529-Speed 3360.47 samples/sec Loss 5.7795 LearningRate 0.0429 Epoch: 6 Global Step: 85690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:07:02,642-Speed 3290.21 samples/sec Loss 5.8703 LearningRate 0.0429 Epoch: 6 Global Step: 85700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:07:05,720-Speed 3328.10 samples/sec Loss 5.7572 LearningRate 0.0429 Epoch: 6 Global Step: 85710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:07:08,816-Speed 3308.77 samples/sec Loss 5.7840 LearningRate 0.0429 Epoch: 6 Global Step: 85720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:07:11,856-Speed 3369.77 samples/sec Loss 5.8621 LearningRate 0.0429 Epoch: 6 Global Step: 85730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:07:14,956-Speed 3303.85 samples/sec Loss 5.8227 LearningRate 0.0429 Epoch: 6 Global Step: 85740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:07:18,074-Speed 3285.63 samples/sec Loss 5.7808 LearningRate 0.0429 Epoch: 6 Global Step: 85750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:07:21,108-Speed 3375.41 samples/sec Loss 5.8265 LearningRate 0.0429 Epoch: 6 Global Step: 85760 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:07:24,197-Speed 3316.59 samples/sec Loss 5.7131 LearningRate 0.0429 Epoch: 6 Global Step: 85770 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:07:27,258-Speed 3346.32 samples/sec Loss 5.7961 LearningRate 0.0429 Epoch: 6 Global Step: 85780 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:07:30,354-Speed 3308.41 samples/sec Loss 5.8954 LearningRate 0.0429 Epoch: 6 Global Step: 85790 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:07:33,418-Speed 3343.50 samples/sec Loss 5.8129 LearningRate 0.0429 Epoch: 6 Global Step: 85800 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:07:36,552-Speed 3268.76 samples/sec Loss 5.7516 LearningRate 0.0428 Epoch: 6 Global Step: 85810 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:07:39,600-Speed 3360.32 samples/sec Loss 5.7657 LearningRate 0.0428 Epoch: 6 Global Step: 85820 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:07:42,735-Speed 3267.55 samples/sec Loss 5.7624 LearningRate 0.0428 Epoch: 6 Global Step: 85830 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:07:45,793-Speed 3349.09 samples/sec Loss 5.7195 LearningRate 0.0428 Epoch: 6 Global Step: 85840 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:07:48,885-Speed 3313.77 samples/sec Loss 5.7775 LearningRate 0.0428 Epoch: 6 Global Step: 85850 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:07:51,992-Speed 3296.33 samples/sec Loss 5.8060 LearningRate 0.0428 Epoch: 6 Global Step: 85860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:07:55,174-Speed 3219.45 samples/sec Loss 5.6989 LearningRate 0.0428 Epoch: 6 Global Step: 85870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:07:58,235-Speed 3346.52 samples/sec Loss 5.6936 LearningRate 0.0428 Epoch: 6 Global Step: 85880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:01,408-Speed 3228.46 samples/sec Loss 5.8624 LearningRate 0.0428 Epoch: 6 Global Step: 85890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:04,518-Speed 3293.54 samples/sec Loss 5.8134 LearningRate 0.0428 Epoch: 6 Global Step: 85900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:07,613-Speed 3309.80 samples/sec Loss 5.7553 LearningRate 0.0428 Epoch: 6 Global Step: 85910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:10,687-Speed 3332.03 samples/sec Loss 5.8707 LearningRate 0.0428 Epoch: 6 Global Step: 85920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:13,849-Speed 3240.18 samples/sec Loss 5.8330 LearningRate 0.0428 Epoch: 6 Global Step: 85930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:16,962-Speed 3289.67 samples/sec Loss 5.7851 LearningRate 0.0428 Epoch: 6 Global Step: 85940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:20,096-Speed 3268.25 samples/sec Loss 5.7131 LearningRate 0.0428 Epoch: 6 Global Step: 85950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:23,214-Speed 3285.36 samples/sec Loss 5.7382 LearningRate 0.0428 Epoch: 6 Global Step: 85960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:08:26,315-Speed 3304.08 samples/sec Loss 5.7739 LearningRate 0.0428 Epoch: 6 Global Step: 85970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:08:29,371-Speed 3350.77 samples/sec Loss 5.7473 LearningRate 0.0428 Epoch: 6 Global Step: 85980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:08:32,471-Speed 3304.40 samples/sec Loss 5.8851 LearningRate 0.0428 Epoch: 6 Global Step: 85990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:35,568-Speed 3307.62 samples/sec Loss 5.7752 LearningRate 0.0427 Epoch: 6 Global Step: 86000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:38,737-Speed 3232.05 samples/sec Loss 5.6513 LearningRate 0.0427 Epoch: 6 Global Step: 86010 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:41,843-Speed 3298.32 samples/sec Loss 5.8380 LearningRate 0.0427 Epoch: 6 Global Step: 86020 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:44,926-Speed 3322.22 samples/sec Loss 5.7682 LearningRate 0.0427 Epoch: 6 Global Step: 86030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:48,067-Speed 3261.33 samples/sec Loss 5.8236 LearningRate 0.0427 Epoch: 6 Global Step: 86040 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:51,191-Speed 3279.64 samples/sec Loss 5.7614 LearningRate 0.0427 Epoch: 6 Global Step: 86050 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:54,302-Speed 3292.79 samples/sec Loss 5.7150 LearningRate 0.0427 Epoch: 6 Global Step: 86060 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:08:57,375-Speed 3332.84 samples/sec Loss 5.7808 LearningRate 0.0427 Epoch: 6 Global Step: 86070 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:00,505-Speed 3272.65 samples/sec Loss 5.7377 LearningRate 0.0427 Epoch: 6 Global Step: 86080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:03,580-Speed 3331.78 samples/sec Loss 5.6993 LearningRate 0.0427 Epoch: 6 Global Step: 86090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:09:06,724-Speed 3257.63 samples/sec Loss 5.8092 LearningRate 0.0427 Epoch: 6 Global Step: 86100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:09:09,827-Speed 3300.60 samples/sec Loss 5.9677 LearningRate 0.0427 Epoch: 6 Global Step: 86110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:09:12,953-Speed 3277.60 samples/sec Loss 5.7539 LearningRate 0.0427 Epoch: 6 Global Step: 86120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:09:16,082-Speed 3273.54 samples/sec Loss 5.7231 LearningRate 0.0427 Epoch: 6 Global Step: 86130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:19,243-Speed 3240.32 samples/sec Loss 5.7970 LearningRate 0.0427 Epoch: 6 Global Step: 86140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:22,368-Speed 3277.92 samples/sec Loss 5.9109 LearningRate 0.0427 Epoch: 6 Global Step: 86150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:25,515-Speed 3255.17 samples/sec Loss 5.7135 LearningRate 0.0427 Epoch: 6 Global Step: 86160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:28,622-Speed 3296.86 samples/sec Loss 5.7651 LearningRate 0.0427 Epoch: 6 Global Step: 86170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:31,697-Speed 3330.49 samples/sec Loss 5.8651 LearningRate 0.0427 Epoch: 6 Global Step: 86180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:34,843-Speed 3256.15 samples/sec Loss 5.8223 LearningRate 0.0426 Epoch: 6 Global Step: 86190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:37,946-Speed 3300.62 samples/sec Loss 5.7862 LearningRate 0.0426 Epoch: 6 Global Step: 86200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:41,115-Speed 3232.94 samples/sec Loss 5.7815 LearningRate 0.0426 Epoch: 6 Global Step: 86210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:44,189-Speed 3333.79 samples/sec Loss 5.7422 LearningRate 0.0426 Epoch: 6 Global Step: 86220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:09:47,263-Speed 3332.31 samples/sec Loss 5.8412 LearningRate 0.0426 Epoch: 6 Global Step: 86230 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:09:50,395-Speed 3270.48 samples/sec Loss 5.8630 LearningRate 0.0426 Epoch: 6 Global Step: 86240 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:09:53,597-Speed 3199.59 samples/sec Loss 5.7682 LearningRate 0.0426 Epoch: 6 Global Step: 86250 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:09:56,678-Speed 3323.95 samples/sec Loss 5.6829 LearningRate 0.0426 Epoch: 6 Global Step: 86260 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:09:59,770-Speed 3312.23 samples/sec Loss 5.8678 LearningRate 0.0426 Epoch: 6 Global Step: 86270 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:10:02,931-Speed 3241.31 samples/sec Loss 5.7013 LearningRate 0.0426 Epoch: 6 Global Step: 86280 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:10:06,196-Speed 3137.20 samples/sec Loss 5.7435 LearningRate 0.0426 Epoch: 6 Global Step: 86290 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:10:09,260-Speed 3342.35 samples/sec Loss 5.8393 LearningRate 0.0426 Epoch: 6 Global Step: 86300 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:10:12,338-Speed 3328.36 samples/sec Loss 5.6712 LearningRate 0.0426 Epoch: 6 Global Step: 86310 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:10:15,432-Speed 3311.34 samples/sec Loss 5.7986 LearningRate 0.0426 Epoch: 6 Global Step: 86320 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:10:18,510-Speed 3326.68 samples/sec Loss 5.6791 LearningRate 0.0426 Epoch: 6 Global Step: 86330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:10:21,588-Speed 3328.95 samples/sec Loss 5.7370 LearningRate 0.0426 Epoch: 6 Global Step: 86340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:10:24,676-Speed 3317.08 samples/sec Loss 5.8337 LearningRate 0.0426 Epoch: 6 Global Step: 86350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:10:27,789-Speed 3290.18 samples/sec Loss 5.7407 LearningRate 0.0426 Epoch: 6 Global Step: 86360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:10:30,931-Speed 3260.39 samples/sec Loss 5.7200 LearningRate 0.0426 Epoch: 6 Global Step: 86370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:10:34,070-Speed 3264.16 samples/sec Loss 5.8391 LearningRate 0.0425 Epoch: 6 Global Step: 86380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:10:37,224-Speed 3247.23 samples/sec Loss 5.8297 LearningRate 0.0425 Epoch: 6 Global Step: 86390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:10:40,297-Speed 3333.88 samples/sec Loss 5.8484 LearningRate 0.0425 Epoch: 6 Global Step: 86400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:10:43,441-Speed 3257.35 samples/sec Loss 5.8048 LearningRate 0.0425 Epoch: 6 Global Step: 86410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:10:46,526-Speed 3320.98 samples/sec Loss 5.7290 LearningRate 0.0425 Epoch: 6 Global Step: 86420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:10:49,617-Speed 3312.92 samples/sec Loss 5.6827 LearningRate 0.0425 Epoch: 6 Global Step: 86430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:10:52,737-Speed 3283.67 samples/sec Loss 5.7312 LearningRate 0.0425 Epoch: 6 Global Step: 86440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:10:55,888-Speed 3250.68 samples/sec Loss 5.8410 LearningRate 0.0425 Epoch: 6 Global Step: 86450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:10:58,982-Speed 3310.68 samples/sec Loss 5.7571 LearningRate 0.0425 Epoch: 6 Global Step: 86460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:11:02,065-Speed 3322.98 samples/sec Loss 5.7126 LearningRate 0.0425 Epoch: 6 Global Step: 86470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:11:05,145-Speed 3325.48 samples/sec Loss 5.7687 LearningRate 0.0425 Epoch: 6 Global Step: 86480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:11:08,225-Speed 3325.18 samples/sec Loss 5.8115 LearningRate 0.0425 Epoch: 6 Global Step: 86490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:11:11,325-Speed 3304.61 samples/sec Loss 5.8377 LearningRate 0.0425 Epoch: 6 Global Step: 86500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:11:14,395-Speed 3336.86 samples/sec Loss 5.7287 LearningRate 0.0425 Epoch: 6 Global Step: 86510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:11:17,502-Speed 3296.20 samples/sec Loss 5.6845 LearningRate 0.0425 Epoch: 6 Global Step: 86520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:11:20,631-Speed 3274.07 samples/sec Loss 5.8453 LearningRate 0.0425 Epoch: 6 Global Step: 86530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:11:23,758-Speed 3276.16 samples/sec Loss 5.7855 LearningRate 0.0425 Epoch: 6 Global Step: 86540 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:11:26,887-Speed 3273.47 samples/sec Loss 5.7582 LearningRate 0.0425 Epoch: 6 Global Step: 86550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:11:30,005-Speed 3285.06 samples/sec Loss 5.7622 LearningRate 0.0425 Epoch: 6 Global Step: 86560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:11:33,113-Speed 3296.11 samples/sec Loss 5.7980 LearningRate 0.0424 Epoch: 6 Global Step: 86570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:11:36,249-Speed 3266.19 samples/sec Loss 5.6654 LearningRate 0.0424 Epoch: 6 Global Step: 86580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:11:39,358-Speed 3294.40 samples/sec Loss 5.6181 LearningRate 0.0424 Epoch: 6 Global Step: 86590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:11:42,460-Speed 3302.26 samples/sec Loss 5.7490 LearningRate 0.0424 Epoch: 6 Global Step: 86600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:11:45,527-Speed 3339.61 samples/sec Loss 5.7812 LearningRate 0.0424 Epoch: 6 Global Step: 86610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:11:48,625-Speed 3306.01 samples/sec Loss 5.8283 LearningRate 0.0424 Epoch: 6 Global Step: 86620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:11:51,742-Speed 3286.32 samples/sec Loss 5.8394 LearningRate 0.0424 Epoch: 6 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:11:54,805-Speed 3344.49 samples/sec Loss 5.6546 LearningRate 0.0424 Epoch: 6 Global Step: 86640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:11:57,864-Speed 3347.93 samples/sec Loss 5.7878 LearningRate 0.0424 Epoch: 6 Global Step: 86650 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:12:00,935-Speed 3335.99 samples/sec Loss 5.7639 LearningRate 0.0424 Epoch: 6 Global Step: 86660 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:12:04,068-Speed 3268.70 samples/sec Loss 5.8357 LearningRate 0.0424 Epoch: 6 Global Step: 86670 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:12:07,157-Speed 3316.62 samples/sec Loss 5.7423 LearningRate 0.0424 Epoch: 6 Global Step: 86680 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:12:10,247-Speed 3314.29 samples/sec Loss 5.7832 LearningRate 0.0424 Epoch: 6 Global Step: 86690 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:12:13,459-Speed 3189.46 samples/sec Loss 5.7762 LearningRate 0.0424 Epoch: 6 Global Step: 86700 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:12:16,600-Speed 3260.66 samples/sec Loss 5.7241 LearningRate 0.0424 Epoch: 6 Global Step: 86710 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:12:19,661-Speed 3345.89 samples/sec Loss 5.7138 LearningRate 0.0424 Epoch: 6 Global Step: 86720 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:12:22,774-Speed 3291.33 samples/sec Loss 5.8409 LearningRate 0.0424 Epoch: 6 Global Step: 86730 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:12:25,882-Speed 3295.51 samples/sec Loss 5.8498 LearningRate 0.0424 Epoch: 6 Global Step: 86740 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:12:29,017-Speed 3267.86 samples/sec Loss 5.7532 LearningRate 0.0424 Epoch: 6 Global Step: 86750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:12:32,105-Speed 3316.37 samples/sec Loss 5.9050 LearningRate 0.0423 Epoch: 6 Global Step: 86760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:12:35,237-Speed 3270.81 samples/sec Loss 5.7570 LearningRate 0.0423 Epoch: 6 Global Step: 86770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:12:38,375-Speed 3264.71 samples/sec Loss 5.7519 LearningRate 0.0423 Epoch: 6 Global Step: 86780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:12:41,489-Speed 3289.03 samples/sec Loss 5.7319 LearningRate 0.0423 Epoch: 6 Global Step: 86790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:12:44,609-Speed 3283.30 samples/sec Loss 5.8310 LearningRate 0.0423 Epoch: 6 Global Step: 86800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:12:47,699-Speed 3314.96 samples/sec Loss 5.7243 LearningRate 0.0423 Epoch: 6 Global Step: 86810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:12:50,829-Speed 3272.21 samples/sec Loss 5.7941 LearningRate 0.0423 Epoch: 6 Global Step: 86820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:12:54,020-Speed 3210.31 samples/sec Loss 5.7630 LearningRate 0.0423 Epoch: 6 Global Step: 86830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:12:57,118-Speed 3306.53 samples/sec Loss 5.6779 LearningRate 0.0423 Epoch: 6 Global Step: 86840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:13:00,230-Speed 3291.76 samples/sec Loss 5.7656 LearningRate 0.0423 Epoch: 6 Global Step: 86850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:13:03,339-Speed 3295.14 samples/sec Loss 5.6836 LearningRate 0.0423 Epoch: 6 Global Step: 86860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:13:06,495-Speed 3245.36 samples/sec Loss 5.7642 LearningRate 0.0423 Epoch: 6 Global Step: 86870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:13:09,587-Speed 3312.10 samples/sec Loss 5.7892 LearningRate 0.0423 Epoch: 6 Global Step: 86880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:13:12,697-Speed 3294.55 samples/sec Loss 5.7076 LearningRate 0.0423 Epoch: 6 Global Step: 86890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:13:15,795-Speed 3305.89 samples/sec Loss 5.7216 LearningRate 0.0423 Epoch: 6 Global Step: 86900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:13:18,919-Speed 3279.23 samples/sec Loss 5.7782 LearningRate 0.0423 Epoch: 6 Global Step: 86910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:13:21,974-Speed 3352.97 samples/sec Loss 5.6891 LearningRate 0.0423 Epoch: 6 Global Step: 86920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:13:25,088-Speed 3289.22 samples/sec Loss 5.6916 LearningRate 0.0423 Epoch: 6 Global Step: 86930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:13:28,428-Speed 3067.15 samples/sec Loss 5.8552 LearningRate 0.0423 Epoch: 6 Global Step: 86940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:13:59,893-Speed 325.45 samples/sec Loss 5.2976 LearningRate 0.0422 Epoch: 7 Global Step: 86950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:03,133-Speed 3162.25 samples/sec Loss 4.3257 LearningRate 0.0422 Epoch: 7 Global Step: 86960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:06,356-Speed 3178.31 samples/sec Loss 4.3542 LearningRate 0.0422 Epoch: 7 Global Step: 86970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:09,399-Speed 3366.11 samples/sec Loss 4.3194 LearningRate 0.0422 Epoch: 7 Global Step: 86980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:12,529-Speed 3272.21 samples/sec Loss 4.2421 LearningRate 0.0422 Epoch: 7 Global Step: 86990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:15,643-Speed 3289.40 samples/sec Loss 4.3839 LearningRate 0.0422 Epoch: 7 Global Step: 87000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:18,782-Speed 3262.81 samples/sec Loss 4.3706 LearningRate 0.0422 Epoch: 7 Global Step: 87010 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:21,877-Speed 3310.42 samples/sec Loss 4.3842 LearningRate 0.0422 Epoch: 7 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:14:25,080-Speed 3197.40 samples/sec Loss 4.3791 LearningRate 0.0422 Epoch: 7 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:14:28,178-Speed 3306.71 samples/sec Loss 4.3726 LearningRate 0.0422 Epoch: 7 Global Step: 87040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:14:31,305-Speed 3275.57 samples/sec Loss 4.3333 LearningRate 0.0422 Epoch: 7 Global Step: 87050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:14:34,477-Speed 3228.76 samples/sec Loss 4.3455 LearningRate 0.0422 Epoch: 7 Global Step: 87060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:14:37,620-Speed 3259.13 samples/sec Loss 4.3667 LearningRate 0.0422 Epoch: 7 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:14:40,760-Speed 3262.78 samples/sec Loss 4.4366 LearningRate 0.0422 Epoch: 7 Global Step: 87080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:14:43,858-Speed 3306.71 samples/sec Loss 4.4819 LearningRate 0.0422 Epoch: 7 Global Step: 87090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:47,055-Speed 3203.56 samples/sec Loss 4.4163 LearningRate 0.0422 Epoch: 7 Global Step: 87100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:50,133-Speed 3327.58 samples/sec Loss 4.5705 LearningRate 0.0422 Epoch: 7 Global Step: 87110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:53,283-Speed 3252.24 samples/sec Loss 4.4497 LearningRate 0.0422 Epoch: 7 Global Step: 87120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:56,382-Speed 3305.74 samples/sec Loss 4.3970 LearningRate 0.0422 Epoch: 7 Global Step: 87130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:14:59,450-Speed 3338.50 samples/sec Loss 4.4045 LearningRate 0.0421 Epoch: 7 Global Step: 87140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:15:02,583-Speed 3268.93 samples/sec Loss 4.5195 LearningRate 0.0421 Epoch: 7 Global Step: 87150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:15:05,704-Speed 3281.99 samples/sec Loss 4.4256 LearningRate 0.0421 Epoch: 7 Global Step: 87160 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:08,744-Speed 3369.97 samples/sec Loss 4.2934 LearningRate 0.0421 Epoch: 7 Global Step: 87170 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:11,839-Speed 3309.53 samples/sec Loss 4.4273 LearningRate 0.0421 Epoch: 7 Global Step: 87180 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:14,942-Speed 3302.25 samples/sec Loss 4.4034 LearningRate 0.0421 Epoch: 7 Global Step: 87190 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:18,145-Speed 3198.51 samples/sec Loss 4.4996 LearningRate 0.0421 Epoch: 7 Global Step: 87200 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:21,478-Speed 3072.43 samples/sec Loss 4.4879 LearningRate 0.0421 Epoch: 7 Global Step: 87210 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:24,552-Speed 3333.15 samples/sec Loss 4.4963 LearningRate 0.0421 Epoch: 7 Global Step: 87220 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:27,656-Speed 3300.28 samples/sec Loss 4.4249 LearningRate 0.0421 Epoch: 7 Global Step: 87230 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:30,800-Speed 3257.07 samples/sec Loss 4.4919 LearningRate 0.0421 Epoch: 7 Global Step: 87240 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:33,958-Speed 3243.36 samples/sec Loss 4.4273 LearningRate 0.0421 Epoch: 7 Global Step: 87250 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:37,044-Speed 3319.86 samples/sec Loss 4.5047 LearningRate 0.0421 Epoch: 7 Global Step: 87260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:15:40,233-Speed 3211.95 samples/sec Loss 4.4918 LearningRate 0.0421 Epoch: 7 Global Step: 87270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:15:43,370-Speed 3265.43 samples/sec Loss 4.5427 LearningRate 0.0421 Epoch: 7 Global Step: 87280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:15:46,438-Speed 3338.82 samples/sec Loss 4.4961 LearningRate 0.0421 Epoch: 7 Global Step: 87290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:15:49,518-Speed 3326.21 samples/sec Loss 4.5152 LearningRate 0.0421 Epoch: 7 Global Step: 87300 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:52,664-Speed 3255.62 samples/sec Loss 4.5319 LearningRate 0.0421 Epoch: 7 Global Step: 87310 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:55,743-Speed 3327.13 samples/sec Loss 4.4386 LearningRate 0.0421 Epoch: 7 Global Step: 87320 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:15:58,835-Speed 3312.41 samples/sec Loss 4.4685 LearningRate 0.0420 Epoch: 7 Global Step: 87330 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:16:01,989-Speed 3248.14 samples/sec Loss 4.4790 LearningRate 0.0420 Epoch: 7 Global Step: 87340 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:16:05,111-Speed 3281.01 samples/sec Loss 4.5076 LearningRate 0.0420 Epoch: 7 Global Step: 87350 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:16:08,228-Speed 3285.95 samples/sec Loss 4.4609 LearningRate 0.0420 Epoch: 7 Global Step: 87360 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:16:11,327-Speed 3305.00 samples/sec Loss 4.4674 LearningRate 0.0420 Epoch: 7 Global Step: 87370 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:16:14,408-Speed 3325.28 samples/sec Loss 4.5712 LearningRate 0.0420 Epoch: 7 Global Step: 87380 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:16:17,544-Speed 3266.04 samples/sec Loss 4.5590 LearningRate 0.0420 Epoch: 7 Global Step: 87390 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:16:20,617-Speed 3333.03 samples/sec Loss 4.5293 LearningRate 0.0420 Epoch: 7 Global Step: 87400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:23,696-Speed 3327.30 samples/sec Loss 4.4071 LearningRate 0.0420 Epoch: 7 Global Step: 87410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:26,808-Speed 3291.48 samples/sec Loss 4.5577 LearningRate 0.0420 Epoch: 7 Global Step: 87420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:29,894-Speed 3319.46 samples/sec Loss 4.6735 LearningRate 0.0420 Epoch: 7 Global Step: 87430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:32,942-Speed 3360.89 samples/sec Loss 4.5785 LearningRate 0.0420 Epoch: 7 Global Step: 87440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:36,043-Speed 3302.56 samples/sec Loss 4.4914 LearningRate 0.0420 Epoch: 7 Global Step: 87450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:39,212-Speed 3232.72 samples/sec Loss 4.6158 LearningRate 0.0420 Epoch: 7 Global Step: 87460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:42,296-Speed 3321.26 samples/sec Loss 4.5026 LearningRate 0.0420 Epoch: 7 Global Step: 87470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:45,374-Speed 3328.21 samples/sec Loss 4.6251 LearningRate 0.0420 Epoch: 7 Global Step: 87480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:48,477-Speed 3300.76 samples/sec Loss 4.5650 LearningRate 0.0420 Epoch: 7 Global Step: 87490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:51,567-Speed 3315.29 samples/sec Loss 4.6335 LearningRate 0.0420 Epoch: 7 Global Step: 87500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:54,710-Speed 3259.06 samples/sec Loss 4.5451 LearningRate 0.0420 Epoch: 7 Global Step: 87510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:16:57,785-Speed 3331.69 samples/sec Loss 4.5316 LearningRate 0.0420 Epoch: 7 Global Step: 87520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:17:00,864-Speed 3326.36 samples/sec Loss 4.5562 LearningRate 0.0419 Epoch: 7 Global Step: 87530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:17:03,963-Speed 3304.47 samples/sec Loss 4.5828 LearningRate 0.0419 Epoch: 7 Global Step: 87540 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:17:07,027-Speed 3344.00 samples/sec Loss 4.5792 LearningRate 0.0419 Epoch: 7 Global Step: 87550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:17:10,076-Speed 3359.76 samples/sec Loss 4.6373 LearningRate 0.0419 Epoch: 7 Global Step: 87560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:17:13,191-Speed 3288.62 samples/sec Loss 4.5439 LearningRate 0.0419 Epoch: 7 Global Step: 87570 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:17:16,309-Speed 3285.07 samples/sec Loss 4.6282 LearningRate 0.0419 Epoch: 7 Global Step: 87580 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:17:19,367-Speed 3349.87 samples/sec Loss 4.6003 LearningRate 0.0419 Epoch: 7 Global Step: 87590 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:17:22,415-Speed 3360.43 samples/sec Loss 4.5654 LearningRate 0.0419 Epoch: 7 Global Step: 87600 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:17:25,529-Speed 3289.85 samples/sec Loss 4.6248 LearningRate 0.0419 Epoch: 7 Global Step: 87610 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:17:28,601-Speed 3334.93 samples/sec Loss 4.4872 LearningRate 0.0419 Epoch: 7 Global Step: 87620 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:17:31,680-Speed 3326.13 samples/sec Loss 4.5063 LearningRate 0.0419 Epoch: 7 Global Step: 87630 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:17:34,743-Speed 3343.90 samples/sec Loss 4.6142 LearningRate 0.0419 Epoch: 7 Global Step: 87640 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:17:37,888-Speed 3257.43 samples/sec Loss 4.6237 LearningRate 0.0419 Epoch: 7 Global Step: 87650 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:17:40,981-Speed 3311.22 samples/sec Loss 4.6479 LearningRate 0.0419 Epoch: 7 Global Step: 87660 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:17:44,101-Speed 3284.09 samples/sec Loss 4.6350 LearningRate 0.0419 Epoch: 7 Global Step: 87670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:17:47,230-Speed 3272.93 samples/sec Loss 4.5902 LearningRate 0.0419 Epoch: 7 Global Step: 87680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:17:50,307-Speed 3329.07 samples/sec Loss 4.5299 LearningRate 0.0419 Epoch: 7 Global Step: 87690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:17:53,353-Speed 3363.98 samples/sec Loss 4.5121 LearningRate 0.0419 Epoch: 7 Global Step: 87700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:17:56,478-Speed 3277.75 samples/sec Loss 4.5883 LearningRate 0.0419 Epoch: 7 Global Step: 87710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:17:59,531-Speed 3355.00 samples/sec Loss 4.5798 LearningRate 0.0418 Epoch: 7 Global Step: 87720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:02,650-Speed 3284.28 samples/sec Loss 4.6214 LearningRate 0.0418 Epoch: 7 Global Step: 87730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:05,805-Speed 3246.22 samples/sec Loss 4.5880 LearningRate 0.0418 Epoch: 7 Global Step: 87740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:08,958-Speed 3249.26 samples/sec Loss 4.6115 LearningRate 0.0418 Epoch: 7 Global Step: 87750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:12,052-Speed 3310.62 samples/sec Loss 4.6105 LearningRate 0.0418 Epoch: 7 Global Step: 87760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:15,107-Speed 3353.11 samples/sec Loss 4.6040 LearningRate 0.0418 Epoch: 7 Global Step: 87770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:18:18,191-Speed 3321.39 samples/sec Loss 4.6142 LearningRate 0.0418 Epoch: 7 Global Step: 87780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:18:21,270-Speed 3325.99 samples/sec Loss 4.6134 LearningRate 0.0418 Epoch: 7 Global Step: 87790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:18:24,345-Speed 3332.07 samples/sec Loss 4.6066 LearningRate 0.0418 Epoch: 7 Global Step: 87800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:27,442-Speed 3306.52 samples/sec Loss 4.5687 LearningRate 0.0418 Epoch: 7 Global Step: 87810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:30,573-Speed 3272.00 samples/sec Loss 4.6406 LearningRate 0.0418 Epoch: 7 Global Step: 87820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:33,622-Speed 3359.21 samples/sec Loss 4.4928 LearningRate 0.0418 Epoch: 7 Global Step: 87830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:36,734-Speed 3292.23 samples/sec Loss 4.7234 LearningRate 0.0418 Epoch: 7 Global Step: 87840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:39,860-Speed 3276.12 samples/sec Loss 4.7061 LearningRate 0.0418 Epoch: 7 Global Step: 87850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:42,990-Speed 3272.59 samples/sec Loss 4.6444 LearningRate 0.0418 Epoch: 7 Global Step: 87860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:46,048-Speed 3349.66 samples/sec Loss 4.5687 LearningRate 0.0418 Epoch: 7 Global Step: 87870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:49,149-Speed 3302.83 samples/sec Loss 4.6275 LearningRate 0.0418 Epoch: 7 Global Step: 87880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:52,286-Speed 3266.28 samples/sec Loss 4.5924 LearningRate 0.0418 Epoch: 7 Global Step: 87890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:18:55,395-Speed 3294.06 samples/sec Loss 4.6094 LearningRate 0.0418 Epoch: 7 Global Step: 87900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:18:58,434-Speed 3370.91 samples/sec Loss 4.7148 LearningRate 0.0417 Epoch: 7 Global Step: 87910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:01,502-Speed 3338.22 samples/sec Loss 4.6194 LearningRate 0.0417 Epoch: 7 Global Step: 87920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:04,627-Speed 3278.33 samples/sec Loss 4.6181 LearningRate 0.0417 Epoch: 7 Global Step: 87930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:07,746-Speed 3284.25 samples/sec Loss 4.5208 LearningRate 0.0417 Epoch: 7 Global Step: 87940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:10,805-Speed 3348.45 samples/sec Loss 4.6883 LearningRate 0.0417 Epoch: 7 Global Step: 87950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:13,892-Speed 3318.72 samples/sec Loss 4.6131 LearningRate 0.0417 Epoch: 7 Global Step: 87960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:16,948-Speed 3351.16 samples/sec Loss 4.7564 LearningRate 0.0417 Epoch: 7 Global Step: 87970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:20,045-Speed 3307.23 samples/sec Loss 4.6535 LearningRate 0.0417 Epoch: 7 Global Step: 87980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:23,160-Speed 3288.75 samples/sec Loss 4.6872 LearningRate 0.0417 Epoch: 7 Global Step: 87990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:26,245-Speed 3320.15 samples/sec Loss 4.6040 LearningRate 0.0417 Epoch: 7 Global Step: 88000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:29,416-Speed 3229.81 samples/sec Loss 4.6995 LearningRate 0.0417 Epoch: 7 Global Step: 88010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:19:32,489-Speed 3333.88 samples/sec Loss 4.6830 LearningRate 0.0417 Epoch: 7 Global Step: 88020 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:35,597-Speed 3296.40 samples/sec Loss 4.7714 LearningRate 0.0417 Epoch: 7 Global Step: 88030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:38,767-Speed 3231.05 samples/sec Loss 4.6987 LearningRate 0.0417 Epoch: 7 Global Step: 88040 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:41,886-Speed 3284.15 samples/sec Loss 4.7267 LearningRate 0.0417 Epoch: 7 Global Step: 88050 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:44,953-Speed 3339.90 samples/sec Loss 4.7659 LearningRate 0.0417 Epoch: 7 Global Step: 88060 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:48,135-Speed 3219.05 samples/sec Loss 4.6931 LearningRate 0.0417 Epoch: 7 Global Step: 88070 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:51,337-Speed 3199.01 samples/sec Loss 4.5968 LearningRate 0.0417 Epoch: 7 Global Step: 88080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:54,396-Speed 3348.07 samples/sec Loss 4.6587 LearningRate 0.0417 Epoch: 7 Global Step: 88090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:19:57,453-Speed 3351.34 samples/sec Loss 4.6855 LearningRate 0.0416 Epoch: 7 Global Step: 88100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:00,557-Speed 3299.97 samples/sec Loss 4.6947 LearningRate 0.0416 Epoch: 7 Global Step: 88110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:03,691-Speed 3267.69 samples/sec Loss 4.7096 LearningRate 0.0416 Epoch: 7 Global Step: 88120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:20:06,802-Speed 3292.96 samples/sec Loss 4.6450 LearningRate 0.0416 Epoch: 7 Global Step: 88130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:09,904-Speed 3302.13 samples/sec Loss 4.6427 LearningRate 0.0416 Epoch: 7 Global Step: 88140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:13,082-Speed 3223.21 samples/sec Loss 4.6735 LearningRate 0.0416 Epoch: 7 Global Step: 88150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:16,200-Speed 3285.66 samples/sec Loss 4.7372 LearningRate 0.0416 Epoch: 7 Global Step: 88160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:19,288-Speed 3317.08 samples/sec Loss 4.6909 LearningRate 0.0416 Epoch: 7 Global Step: 88170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:22,376-Speed 3317.18 samples/sec Loss 4.6517 LearningRate 0.0416 Epoch: 7 Global Step: 88180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:25,561-Speed 3216.28 samples/sec Loss 4.7107 LearningRate 0.0416 Epoch: 7 Global Step: 88190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:28,656-Speed 3309.01 samples/sec Loss 4.7867 LearningRate 0.0416 Epoch: 7 Global Step: 88200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:31,755-Speed 3305.56 samples/sec Loss 4.6785 LearningRate 0.0416 Epoch: 7 Global Step: 88210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:34,855-Speed 3304.52 samples/sec Loss 4.6035 LearningRate 0.0416 Epoch: 7 Global Step: 88220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:38,026-Speed 3229.80 samples/sec Loss 4.6138 LearningRate 0.0416 Epoch: 7 Global Step: 88230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:20:41,198-Speed 3229.51 samples/sec Loss 4.7255 LearningRate 0.0416 Epoch: 7 Global Step: 88240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:20:44,326-Speed 3274.96 samples/sec Loss 4.7663 LearningRate 0.0416 Epoch: 7 Global Step: 88250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:20:47,380-Speed 3353.27 samples/sec Loss 4.6008 LearningRate 0.0416 Epoch: 7 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:20:50,508-Speed 3275.08 samples/sec Loss 4.6859 LearningRate 0.0416 Epoch: 7 Global Step: 88270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:53,714-Speed 3194.82 samples/sec Loss 4.8083 LearningRate 0.0416 Epoch: 7 Global Step: 88280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:56,812-Speed 3305.84 samples/sec Loss 4.7123 LearningRate 0.0416 Epoch: 7 Global Step: 88290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:20:59,878-Speed 3341.36 samples/sec Loss 4.5975 LearningRate 0.0415 Epoch: 7 Global Step: 88300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:21:03,000-Speed 3281.55 samples/sec Loss 4.7380 LearningRate 0.0415 Epoch: 7 Global Step: 88310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:21:06,190-Speed 3210.26 samples/sec Loss 4.7803 LearningRate 0.0415 Epoch: 7 Global Step: 88320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:21:09,312-Speed 3282.72 samples/sec Loss 4.7167 LearningRate 0.0415 Epoch: 7 Global Step: 88330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:21:12,411-Speed 3304.96 samples/sec Loss 4.8052 LearningRate 0.0415 Epoch: 7 Global Step: 88340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:21:15,565-Speed 3247.12 samples/sec Loss 4.7201 LearningRate 0.0415 Epoch: 7 Global Step: 88350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:21:18,695-Speed 3273.30 samples/sec Loss 4.7880 LearningRate 0.0415 Epoch: 7 Global Step: 88360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:21:21,761-Speed 3340.42 samples/sec Loss 4.6549 LearningRate 0.0415 Epoch: 7 Global Step: 88370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:24,848-Speed 3318.68 samples/sec Loss 4.8436 LearningRate 0.0415 Epoch: 7 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:27,931-Speed 3321.64 samples/sec Loss 4.7498 LearningRate 0.0415 Epoch: 7 Global Step: 88390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:30,990-Speed 3349.10 samples/sec Loss 4.7610 LearningRate 0.0415 Epoch: 7 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:34,101-Speed 3292.55 samples/sec Loss 4.7729 LearningRate 0.0415 Epoch: 7 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:37,252-Speed 3251.39 samples/sec Loss 4.7658 LearningRate 0.0415 Epoch: 7 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:40,330-Speed 3326.97 samples/sec Loss 4.7889 LearningRate 0.0415 Epoch: 7 Global Step: 88430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:43,429-Speed 3305.88 samples/sec Loss 4.7572 LearningRate 0.0415 Epoch: 7 Global Step: 88440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:46,530-Speed 3303.33 samples/sec Loss 4.7323 LearningRate 0.0415 Epoch: 7 Global Step: 88450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:49,617-Speed 3318.34 samples/sec Loss 4.6870 LearningRate 0.0415 Epoch: 7 Global Step: 88460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:52,705-Speed 3316.57 samples/sec Loss 4.8691 LearningRate 0.0415 Epoch: 7 Global Step: 88470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:55,804-Speed 3305.89 samples/sec Loss 4.6878 LearningRate 0.0415 Epoch: 7 Global Step: 88480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:21:58,872-Speed 3338.11 samples/sec Loss 4.7807 LearningRate 0.0414 Epoch: 7 Global Step: 88490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:22:02,013-Speed 3260.82 samples/sec Loss 4.7790 LearningRate 0.0414 Epoch: 7 Global Step: 88500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:05,107-Speed 3311.42 samples/sec Loss 4.8345 LearningRate 0.0414 Epoch: 7 Global Step: 88510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:08,181-Speed 3332.59 samples/sec Loss 4.7258 LearningRate 0.0414 Epoch: 7 Global Step: 88520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:11,231-Speed 3357.41 samples/sec Loss 4.8463 LearningRate 0.0414 Epoch: 7 Global Step: 88530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:14,336-Speed 3299.38 samples/sec Loss 4.7273 LearningRate 0.0414 Epoch: 7 Global Step: 88540 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:17,422-Speed 3319.38 samples/sec Loss 4.8587 LearningRate 0.0414 Epoch: 7 Global Step: 88550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:20,491-Speed 3337.47 samples/sec Loss 4.8557 LearningRate 0.0414 Epoch: 7 Global Step: 88560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:23,566-Speed 3331.45 samples/sec Loss 4.8272 LearningRate 0.0414 Epoch: 7 Global Step: 88570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:26,657-Speed 3314.39 samples/sec Loss 4.7189 LearningRate 0.0414 Epoch: 7 Global Step: 88580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:29,783-Speed 3276.34 samples/sec Loss 4.7700 LearningRate 0.0414 Epoch: 7 Global Step: 88590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:32,891-Speed 3295.35 samples/sec Loss 4.7544 LearningRate 0.0414 Epoch: 7 Global Step: 88600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:22:35,953-Speed 3345.65 samples/sec Loss 4.7780 LearningRate 0.0414 Epoch: 7 Global Step: 88610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:39,037-Speed 3321.52 samples/sec Loss 4.7745 LearningRate 0.0414 Epoch: 7 Global Step: 88620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:22:42,093-Speed 3352.40 samples/sec Loss 4.8089 LearningRate 0.0414 Epoch: 7 Global Step: 88630 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:22:45,168-Speed 3330.32 samples/sec Loss 4.6927 LearningRate 0.0414 Epoch: 7 Global Step: 88640 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:22:48,349-Speed 3220.64 samples/sec Loss 4.8174 LearningRate 0.0414 Epoch: 7 Global Step: 88650 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:22:51,495-Speed 3255.48 samples/sec Loss 4.7905 LearningRate 0.0414 Epoch: 7 Global Step: 88660 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:22:54,714-Speed 3181.95 samples/sec Loss 4.8234 LearningRate 0.0414 Epoch: 7 Global Step: 88670 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:22:57,785-Speed 3336.17 samples/sec Loss 4.8604 LearningRate 0.0413 Epoch: 7 Global Step: 88680 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:23:00,832-Speed 3361.67 samples/sec Loss 4.6741 LearningRate 0.0413 Epoch: 7 Global Step: 88690 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:23:03,991-Speed 3242.52 samples/sec Loss 4.8437 LearningRate 0.0413 Epoch: 7 Global Step: 88700 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:23:07,061-Speed 3336.72 samples/sec Loss 4.7878 LearningRate 0.0413 Epoch: 7 Global Step: 88710 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:23:10,116-Speed 3353.10 samples/sec Loss 4.7397 LearningRate 0.0413 Epoch: 7 Global Step: 88720 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:23:13,222-Speed 3297.82 samples/sec Loss 4.8250 LearningRate 0.0413 Epoch: 7 Global Step: 88730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:16,417-Speed 3205.92 samples/sec Loss 4.8293 LearningRate 0.0413 Epoch: 7 Global Step: 88740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:19,527-Speed 3294.18 samples/sec Loss 4.8510 LearningRate 0.0413 Epoch: 7 Global Step: 88750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:22,613-Speed 3318.94 samples/sec Loss 4.8746 LearningRate 0.0413 Epoch: 7 Global Step: 88760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:25,697-Speed 3321.38 samples/sec Loss 4.8599 LearningRate 0.0413 Epoch: 7 Global Step: 88770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:28,768-Speed 3336.30 samples/sec Loss 4.8096 LearningRate 0.0413 Epoch: 7 Global Step: 88780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:31,870-Speed 3302.26 samples/sec Loss 4.8289 LearningRate 0.0413 Epoch: 7 Global Step: 88790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:34,979-Speed 3294.36 samples/sec Loss 4.8813 LearningRate 0.0413 Epoch: 7 Global Step: 88800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:38,038-Speed 3349.07 samples/sec Loss 4.8496 LearningRate 0.0413 Epoch: 7 Global Step: 88810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:41,166-Speed 3274.18 samples/sec Loss 4.7791 LearningRate 0.0413 Epoch: 7 Global Step: 88820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:44,230-Speed 3343.63 samples/sec Loss 4.8600 LearningRate 0.0413 Epoch: 7 Global Step: 88830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:23:47,361-Speed 3271.14 samples/sec Loss 4.8136 LearningRate 0.0413 Epoch: 7 Global Step: 88840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:50,470-Speed 3294.66 samples/sec Loss 4.9047 LearningRate 0.0413 Epoch: 7 Global Step: 88850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:53,602-Speed 3271.35 samples/sec Loss 4.8529 LearningRate 0.0413 Epoch: 7 Global Step: 88860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:56,668-Speed 3340.78 samples/sec Loss 4.9079 LearningRate 0.0412 Epoch: 7 Global Step: 88870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:23:59,755-Speed 3318.41 samples/sec Loss 4.8808 LearningRate 0.0412 Epoch: 7 Global Step: 88880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:02,839-Speed 3321.36 samples/sec Loss 4.7466 LearningRate 0.0412 Epoch: 7 Global Step: 88890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:05,913-Speed 3332.19 samples/sec Loss 4.9036 LearningRate 0.0412 Epoch: 7 Global Step: 88900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:09,002-Speed 3315.45 samples/sec Loss 4.7820 LearningRate 0.0412 Epoch: 7 Global Step: 88910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:12,088-Speed 3320.10 samples/sec Loss 4.9345 LearningRate 0.0412 Epoch: 7 Global Step: 88920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:15,145-Speed 3351.30 samples/sec Loss 4.7827 LearningRate 0.0412 Epoch: 7 Global Step: 88930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:18,214-Speed 3337.62 samples/sec Loss 4.9541 LearningRate 0.0412 Epoch: 7 Global Step: 88940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:24:21,244-Speed 3380.58 samples/sec Loss 4.8510 LearningRate 0.0412 Epoch: 7 Global Step: 88950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:24,322-Speed 3326.97 samples/sec Loss 4.8775 LearningRate 0.0412 Epoch: 7 Global Step: 88960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:27,463-Speed 3261.30 samples/sec Loss 4.8709 LearningRate 0.0412 Epoch: 7 Global Step: 88970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:30,606-Speed 3259.17 samples/sec Loss 4.8905 LearningRate 0.0412 Epoch: 7 Global Step: 88980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:33,709-Speed 3301.23 samples/sec Loss 4.9111 LearningRate 0.0412 Epoch: 7 Global Step: 88990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:36,827-Speed 3285.56 samples/sec Loss 4.7962 LearningRate 0.0412 Epoch: 7 Global Step: 89000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:39,877-Speed 3358.46 samples/sec Loss 4.8298 LearningRate 0.0412 Epoch: 7 Global Step: 89010 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:43,013-Speed 3265.94 samples/sec Loss 4.8617 LearningRate 0.0412 Epoch: 7 Global Step: 89020 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:46,073-Speed 3347.55 samples/sec Loss 4.8997 LearningRate 0.0412 Epoch: 7 Global Step: 89030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:49,130-Speed 3351.01 samples/sec Loss 4.8317 LearningRate 0.0412 Epoch: 7 Global Step: 89040 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:24:52,258-Speed 3274.60 samples/sec Loss 4.9204 LearningRate 0.0412 Epoch: 7 Global Step: 89050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:24:55,417-Speed 3242.67 samples/sec Loss 4.8837 LearningRate 0.0412 Epoch: 7 Global Step: 89060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:24:58,469-Speed 3355.77 samples/sec Loss 4.9203 LearningRate 0.0411 Epoch: 7 Global Step: 89070 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:25:01,567-Speed 3307.54 samples/sec Loss 4.7378 LearningRate 0.0411 Epoch: 7 Global Step: 89080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:25:04,650-Speed 3321.96 samples/sec Loss 4.9579 LearningRate 0.0411 Epoch: 7 Global Step: 89090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:25:07,793-Speed 3259.58 samples/sec Loss 4.9987 LearningRate 0.0411 Epoch: 7 Global Step: 89100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:25:10,940-Speed 3254.70 samples/sec Loss 4.8869 LearningRate 0.0411 Epoch: 7 Global Step: 89110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:25:14,107-Speed 3234.58 samples/sec Loss 4.8748 LearningRate 0.0411 Epoch: 7 Global Step: 89120 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:25:17,259-Speed 3249.54 samples/sec Loss 4.9754 LearningRate 0.0411 Epoch: 7 Global Step: 89130 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:25:20,318-Speed 3348.50 samples/sec Loss 4.7540 LearningRate 0.0411 Epoch: 7 Global Step: 89140 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:25:23,411-Speed 3312.32 samples/sec Loss 4.8203 LearningRate 0.0411 Epoch: 7 Global Step: 89150 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:25:26,547-Speed 3266.21 samples/sec Loss 4.9159 LearningRate 0.0411 Epoch: 7 Global Step: 89160 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:25:29,637-Speed 3314.72 samples/sec Loss 4.9470 LearningRate 0.0411 Epoch: 7 Global Step: 89170 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:25:32,735-Speed 3306.46 samples/sec Loss 4.8908 LearningRate 0.0411 Epoch: 7 Global Step: 89180 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:25:35,817-Speed 3324.12 samples/sec Loss 4.9762 LearningRate 0.0411 Epoch: 7 Global Step: 89190 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:25:38,922-Speed 3298.18 samples/sec Loss 4.9138 LearningRate 0.0411 Epoch: 7 Global Step: 89200 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:25:42,075-Speed 3248.81 samples/sec Loss 4.7788 LearningRate 0.0411 Epoch: 7 Global Step: 89210 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:25:45,132-Speed 3351.02 samples/sec Loss 4.8832 LearningRate 0.0411 Epoch: 7 Global Step: 89220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:25:48,250-Speed 3285.11 samples/sec Loss 4.9514 LearningRate 0.0411 Epoch: 7 Global Step: 89230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:25:51,362-Speed 3292.59 samples/sec Loss 4.8901 LearningRate 0.0411 Epoch: 7 Global Step: 89240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:25:54,504-Speed 3259.38 samples/sec Loss 5.0027 LearningRate 0.0411 Epoch: 7 Global Step: 89250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:25:57,606-Speed 3302.56 samples/sec Loss 4.8800 LearningRate 0.0410 Epoch: 7 Global Step: 89260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:00,709-Speed 3301.92 samples/sec Loss 4.9647 LearningRate 0.0410 Epoch: 7 Global Step: 89270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:03,785-Speed 3329.94 samples/sec Loss 4.8941 LearningRate 0.0410 Epoch: 7 Global Step: 89280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:06,852-Speed 3339.34 samples/sec Loss 4.9203 LearningRate 0.0410 Epoch: 7 Global Step: 89290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:09,917-Speed 3342.78 samples/sec Loss 4.9134 LearningRate 0.0410 Epoch: 7 Global Step: 89300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:13,825-Speed 2621.03 samples/sec Loss 4.9638 LearningRate 0.0410 Epoch: 7 Global Step: 89310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:16,945-Speed 3282.22 samples/sec Loss 4.8942 LearningRate 0.0410 Epoch: 7 Global Step: 89320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:20,036-Speed 3314.28 samples/sec Loss 4.9392 LearningRate 0.0410 Epoch: 7 Global Step: 89330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:23,120-Speed 3322.03 samples/sec Loss 4.9799 LearningRate 0.0410 Epoch: 7 Global Step: 89340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:26,195-Speed 3331.07 samples/sec Loss 4.9132 LearningRate 0.0410 Epoch: 7 Global Step: 89350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:29,315-Speed 3281.84 samples/sec Loss 5.0119 LearningRate 0.0410 Epoch: 7 Global Step: 89360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:32,419-Speed 3300.27 samples/sec Loss 4.9295 LearningRate 0.0410 Epoch: 7 Global Step: 89370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:35,486-Speed 3340.49 samples/sec Loss 4.9897 LearningRate 0.0410 Epoch: 7 Global Step: 89380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:38,597-Speed 3292.70 samples/sec Loss 4.8858 LearningRate 0.0410 Epoch: 7 Global Step: 89390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:41,743-Speed 3255.39 samples/sec Loss 4.9615 LearningRate 0.0410 Epoch: 7 Global Step: 89400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:44,953-Speed 3191.72 samples/sec Loss 5.0121 LearningRate 0.0410 Epoch: 7 Global Step: 89410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:48,016-Speed 3343.58 samples/sec Loss 4.9708 LearningRate 0.0410 Epoch: 7 Global Step: 89420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:51,132-Speed 3287.35 samples/sec Loss 4.8618 LearningRate 0.0410 Epoch: 7 Global Step: 89430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:54,259-Speed 3276.11 samples/sec Loss 5.0734 LearningRate 0.0410 Epoch: 7 Global Step: 89440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:26:57,339-Speed 3325.23 samples/sec Loss 4.9420 LearningRate 0.0410 Epoch: 7 Global Step: 89450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:27:00,480-Speed 3261.73 samples/sec Loss 4.8850 LearningRate 0.0409 Epoch: 7 Global Step: 89460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:27:03,557-Speed 3329.05 samples/sec Loss 5.0001 LearningRate 0.0409 Epoch: 7 Global Step: 89470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:27:06,670-Speed 3290.46 samples/sec Loss 5.0309 LearningRate 0.0409 Epoch: 7 Global Step: 89480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:27:09,749-Speed 3326.42 samples/sec Loss 4.9787 LearningRate 0.0409 Epoch: 7 Global Step: 89490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:27:12,822-Speed 3333.58 samples/sec Loss 4.8840 LearningRate 0.0409 Epoch: 7 Global Step: 89500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:27:15,983-Speed 3240.54 samples/sec Loss 4.9758 LearningRate 0.0409 Epoch: 7 Global Step: 89510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:27:19,098-Speed 3288.67 samples/sec Loss 4.8850 LearningRate 0.0409 Epoch: 7 Global Step: 89520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:27:22,180-Speed 3323.09 samples/sec Loss 5.0321 LearningRate 0.0409 Epoch: 7 Global Step: 89530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:27:25,323-Speed 3259.10 samples/sec Loss 4.8567 LearningRate 0.0409 Epoch: 7 Global Step: 89540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:27:28,467-Speed 3258.60 samples/sec Loss 5.0037 LearningRate 0.0409 Epoch: 7 Global Step: 89550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:27:31,654-Speed 3213.72 samples/sec Loss 4.9062 LearningRate 0.0409 Epoch: 7 Global Step: 89560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:27:34,807-Speed 3248.48 samples/sec Loss 4.9120 LearningRate 0.0409 Epoch: 7 Global Step: 89570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:27:37,884-Speed 3329.54 samples/sec Loss 4.9354 LearningRate 0.0409 Epoch: 7 Global Step: 89580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:27:40,961-Speed 3328.26 samples/sec Loss 4.9718 LearningRate 0.0409 Epoch: 7 Global Step: 89590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:27:44,097-Speed 3266.72 samples/sec Loss 4.9761 LearningRate 0.0409 Epoch: 7 Global Step: 89600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:27:47,162-Speed 3342.45 samples/sec Loss 4.9828 LearningRate 0.0409 Epoch: 7 Global Step: 89610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:27:50,263-Speed 3302.98 samples/sec Loss 4.9537 LearningRate 0.0409 Epoch: 7 Global Step: 89620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:27:53,338-Speed 3330.92 samples/sec Loss 4.9491 LearningRate 0.0409 Epoch: 7 Global Step: 89630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:27:56,402-Speed 3343.18 samples/sec Loss 4.9545 LearningRate 0.0409 Epoch: 7 Global Step: 89640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:27:59,462-Speed 3347.91 samples/sec Loss 4.8947 LearningRate 0.0408 Epoch: 7 Global Step: 89650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:02,556-Speed 3310.41 samples/sec Loss 5.0411 LearningRate 0.0408 Epoch: 7 Global Step: 89660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:05,662-Speed 3297.23 samples/sec Loss 4.9217 LearningRate 0.0408 Epoch: 7 Global Step: 89670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:08,750-Speed 3317.00 samples/sec Loss 5.0157 LearningRate 0.0408 Epoch: 7 Global Step: 89680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:11,861-Speed 3292.85 samples/sec Loss 5.0140 LearningRate 0.0408 Epoch: 7 Global Step: 89690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:14,964-Speed 3301.74 samples/sec Loss 5.0029 LearningRate 0.0408 Epoch: 7 Global Step: 89700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:18,071-Speed 3296.17 samples/sec Loss 4.9288 LearningRate 0.0408 Epoch: 7 Global Step: 89710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:21,128-Speed 3350.35 samples/sec Loss 5.0421 LearningRate 0.0408 Epoch: 7 Global Step: 89720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:28:24,261-Speed 3270.70 samples/sec Loss 4.9309 LearningRate 0.0408 Epoch: 7 Global Step: 89730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:28:27,424-Speed 3238.15 samples/sec Loss 5.0359 LearningRate 0.0408 Epoch: 7 Global Step: 89740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:28:30,572-Speed 3253.91 samples/sec Loss 4.9228 LearningRate 0.0408 Epoch: 7 Global Step: 89750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:33,668-Speed 3308.79 samples/sec Loss 5.0627 LearningRate 0.0408 Epoch: 7 Global Step: 89760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:36,800-Speed 3269.65 samples/sec Loss 4.9511 LearningRate 0.0408 Epoch: 7 Global Step: 89770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:39,872-Speed 3334.24 samples/sec Loss 5.0766 LearningRate 0.0408 Epoch: 7 Global Step: 89780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:42,978-Speed 3298.67 samples/sec Loss 4.9948 LearningRate 0.0408 Epoch: 7 Global Step: 89790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:46,033-Speed 3353.38 samples/sec Loss 5.0400 LearningRate 0.0408 Epoch: 7 Global Step: 89800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:49,151-Speed 3285.08 samples/sec Loss 5.0051 LearningRate 0.0408 Epoch: 7 Global Step: 89810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:52,266-Speed 3288.55 samples/sec Loss 4.9102 LearningRate 0.0408 Epoch: 7 Global Step: 89820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:55,364-Speed 3305.59 samples/sec Loss 4.9136 LearningRate 0.0408 Epoch: 7 Global Step: 89830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:28:58,449-Speed 3320.28 samples/sec Loss 4.9188 LearningRate 0.0407 Epoch: 7 Global Step: 89840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:01,515-Speed 3340.76 samples/sec Loss 5.1173 LearningRate 0.0407 Epoch: 7 Global Step: 89850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:29:04,591-Speed 3330.84 samples/sec Loss 4.9143 LearningRate 0.0407 Epoch: 7 Global Step: 89860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:29:07,672-Speed 3324.46 samples/sec Loss 5.0055 LearningRate 0.0407 Epoch: 7 Global Step: 89870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:29:10,751-Speed 3326.42 samples/sec Loss 5.0439 LearningRate 0.0407 Epoch: 7 Global Step: 89880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:29:13,844-Speed 3311.35 samples/sec Loss 4.9975 LearningRate 0.0407 Epoch: 7 Global Step: 89890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:16,937-Speed 3312.28 samples/sec Loss 4.9714 LearningRate 0.0407 Epoch: 7 Global Step: 89900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:20,060-Speed 3280.27 samples/sec Loss 4.9034 LearningRate 0.0407 Epoch: 7 Global Step: 89910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:23,147-Speed 3318.01 samples/sec Loss 4.9568 LearningRate 0.0407 Epoch: 7 Global Step: 89920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:26,255-Speed 3296.04 samples/sec Loss 5.0633 LearningRate 0.0407 Epoch: 7 Global Step: 89930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:29,317-Speed 3344.53 samples/sec Loss 5.0802 LearningRate 0.0407 Epoch: 7 Global Step: 89940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:32,407-Speed 3314.75 samples/sec Loss 5.0298 LearningRate 0.0407 Epoch: 7 Global Step: 89950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:35,532-Speed 3278.34 samples/sec Loss 4.9688 LearningRate 0.0407 Epoch: 7 Global Step: 89960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:38,721-Speed 3212.10 samples/sec Loss 5.0485 LearningRate 0.0407 Epoch: 7 Global Step: 89970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:41,846-Speed 3277.78 samples/sec Loss 4.9520 LearningRate 0.0407 Epoch: 7 Global Step: 89980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:44,908-Speed 3344.78 samples/sec Loss 5.0364 LearningRate 0.0407 Epoch: 7 Global Step: 89990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:29:48,047-Speed 3263.58 samples/sec Loss 5.0401 LearningRate 0.0407 Epoch: 7 Global Step: 90000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:29:51,156-Speed 3294.33 samples/sec Loss 4.9954 LearningRate 0.0407 Epoch: 7 Global Step: 90010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:29:54,262-Speed 3297.99 samples/sec Loss 5.0705 LearningRate 0.0407 Epoch: 7 Global Step: 90020 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:29:57,373-Speed 3293.15 samples/sec Loss 5.0454 LearningRate 0.0407 Epoch: 7 Global Step: 90030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:00,462-Speed 3316.16 samples/sec Loss 5.1792 LearningRate 0.0406 Epoch: 7 Global Step: 90040 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:03,520-Speed 3349.10 samples/sec Loss 5.0701 LearningRate 0.0406 Epoch: 7 Global Step: 90050 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:06,575-Speed 3353.45 samples/sec Loss 4.9523 LearningRate 0.0406 Epoch: 7 Global Step: 90060 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:09,630-Speed 3353.27 samples/sec Loss 4.9099 LearningRate 0.0406 Epoch: 7 Global Step: 90070 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:12,739-Speed 3294.41 samples/sec Loss 5.0844 LearningRate 0.0406 Epoch: 7 Global Step: 90080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:15,869-Speed 3272.73 samples/sec Loss 5.0988 LearningRate 0.0406 Epoch: 7 Global Step: 90090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:18,971-Speed 3301.78 samples/sec Loss 5.0532 LearningRate 0.0406 Epoch: 7 Global Step: 90100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:22,031-Speed 3347.73 samples/sec Loss 5.0609 LearningRate 0.0406 Epoch: 7 Global Step: 90110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:25,159-Speed 3274.65 samples/sec Loss 5.0834 LearningRate 0.0406 Epoch: 7 Global Step: 90120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:30:28,271-Speed 3291.04 samples/sec Loss 4.9670 LearningRate 0.0406 Epoch: 7 Global Step: 90130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:30:31,335-Speed 3343.39 samples/sec Loss 4.9390 LearningRate 0.0406 Epoch: 7 Global Step: 90140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:34,449-Speed 3289.71 samples/sec Loss 4.9689 LearningRate 0.0406 Epoch: 7 Global Step: 90150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:37,607-Speed 3244.25 samples/sec Loss 5.1006 LearningRate 0.0406 Epoch: 7 Global Step: 90160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:40,785-Speed 3222.90 samples/sec Loss 5.0196 LearningRate 0.0406 Epoch: 7 Global Step: 90170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:43,912-Speed 3275.81 samples/sec Loss 5.0608 LearningRate 0.0406 Epoch: 7 Global Step: 90180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:47,025-Speed 3290.01 samples/sec Loss 5.0388 LearningRate 0.0406 Epoch: 7 Global Step: 90190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:50,167-Speed 3259.81 samples/sec Loss 5.0931 LearningRate 0.0406 Epoch: 7 Global Step: 90200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:53,233-Speed 3341.75 samples/sec Loss 5.0299 LearningRate 0.0406 Epoch: 7 Global Step: 90210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:56,289-Speed 3351.74 samples/sec Loss 5.0825 LearningRate 0.0406 Epoch: 7 Global Step: 90220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:30:59,370-Speed 3324.25 samples/sec Loss 4.9465 LearningRate 0.0405 Epoch: 7 Global Step: 90230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:02,462-Speed 3313.16 samples/sec Loss 5.0501 LearningRate 0.0405 Epoch: 7 Global Step: 90240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:31:05,554-Speed 3312.43 samples/sec Loss 5.0469 LearningRate 0.0405 Epoch: 7 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:31:08,619-Speed 3341.82 samples/sec Loss 5.0647 LearningRate 0.0405 Epoch: 7 Global Step: 90260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:11,674-Speed 3353.41 samples/sec Loss 5.0298 LearningRate 0.0405 Epoch: 7 Global Step: 90270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:14,750-Speed 3330.47 samples/sec Loss 4.9628 LearningRate 0.0405 Epoch: 7 Global Step: 90280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:17,829-Speed 3326.41 samples/sec Loss 5.0674 LearningRate 0.0405 Epoch: 7 Global Step: 90290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:20,876-Speed 3361.97 samples/sec Loss 5.0920 LearningRate 0.0405 Epoch: 7 Global Step: 90300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:23,977-Speed 3302.85 samples/sec Loss 5.0884 LearningRate 0.0405 Epoch: 7 Global Step: 90310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:27,107-Speed 3272.90 samples/sec Loss 5.0300 LearningRate 0.0405 Epoch: 7 Global Step: 90320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:30,184-Speed 3329.39 samples/sec Loss 5.0669 LearningRate 0.0405 Epoch: 7 Global Step: 90330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:33,300-Speed 3286.66 samples/sec Loss 5.0815 LearningRate 0.0405 Epoch: 7 Global Step: 90340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:36,384-Speed 3322.07 samples/sec Loss 5.0666 LearningRate 0.0405 Epoch: 7 Global Step: 90350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:39,479-Speed 3308.58 samples/sec Loss 5.0709 LearningRate 0.0405 Epoch: 7 Global Step: 90360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:31:42,604-Speed 3279.07 samples/sec Loss 4.9891 LearningRate 0.0405 Epoch: 7 Global Step: 90370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:31:45,670-Speed 3340.09 samples/sec Loss 5.0739 LearningRate 0.0405 Epoch: 7 Global Step: 90380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:31:48,781-Speed 3293.28 samples/sec Loss 5.0082 LearningRate 0.0405 Epoch: 7 Global Step: 90390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:51,898-Speed 3286.27 samples/sec Loss 4.9679 LearningRate 0.0405 Epoch: 7 Global Step: 90400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:55,044-Speed 3255.66 samples/sec Loss 5.0024 LearningRate 0.0405 Epoch: 7 Global Step: 90410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:31:58,104-Speed 3347.65 samples/sec Loss 5.0627 LearningRate 0.0405 Epoch: 7 Global Step: 90420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:32:01,155-Speed 3356.78 samples/sec Loss 4.9806 LearningRate 0.0404 Epoch: 7 Global Step: 90430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:32:04,225-Speed 3337.78 samples/sec Loss 5.1086 LearningRate 0.0404 Epoch: 7 Global Step: 90440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:32:07,359-Speed 3267.37 samples/sec Loss 5.0310 LearningRate 0.0404 Epoch: 7 Global Step: 90450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:32:10,419-Speed 3347.26 samples/sec Loss 5.1250 LearningRate 0.0404 Epoch: 7 Global Step: 90460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:32:13,484-Speed 3343.05 samples/sec Loss 5.0196 LearningRate 0.0404 Epoch: 7 Global Step: 90470 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:32:16,600-Speed 3287.07 samples/sec Loss 4.9100 LearningRate 0.0404 Epoch: 7 Global Step: 90480 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:32:19,727-Speed 3275.46 samples/sec Loss 5.0331 LearningRate 0.0404 Epoch: 7 Global Step: 90490 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:32:22,817-Speed 3314.96 samples/sec Loss 5.1088 LearningRate 0.0404 Epoch: 7 Global Step: 90500 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:32:26,013-Speed 3205.02 samples/sec Loss 5.0071 LearningRate 0.0404 Epoch: 7 Global Step: 90510 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:32:29,185-Speed 3230.09 samples/sec Loss 5.0691 LearningRate 0.0404 Epoch: 7 Global Step: 90520 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:32:32,297-Speed 3291.23 samples/sec Loss 5.0497 LearningRate 0.0404 Epoch: 7 Global Step: 90530 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:32:35,420-Speed 3279.53 samples/sec Loss 5.0668 LearningRate 0.0404 Epoch: 7 Global Step: 90540 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:32:38,534-Speed 3290.05 samples/sec Loss 5.0987 LearningRate 0.0404 Epoch: 7 Global Step: 90550 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:32:41,630-Speed 3307.68 samples/sec Loss 5.1215 LearningRate 0.0404 Epoch: 7 Global Step: 90560 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 09:32:44,740-Speed 3294.04 samples/sec Loss 5.0621 LearningRate 0.0404 Epoch: 7 Global Step: 90570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:32:47,852-Speed 3291.27 samples/sec Loss 5.2395 LearningRate 0.0404 Epoch: 7 Global Step: 90580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:32:51,016-Speed 3237.16 samples/sec Loss 5.0541 LearningRate 0.0404 Epoch: 7 Global Step: 90590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:32:54,198-Speed 3219.51 samples/sec Loss 5.0789 LearningRate 0.0404 Epoch: 7 Global Step: 90600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:32:57,263-Speed 3341.71 samples/sec Loss 5.0965 LearningRate 0.0404 Epoch: 7 Global Step: 90610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:00,397-Speed 3269.16 samples/sec Loss 5.1431 LearningRate 0.0403 Epoch: 7 Global Step: 90620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:03,553-Speed 3246.01 samples/sec Loss 4.9978 LearningRate 0.0403 Epoch: 7 Global Step: 90630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:06,703-Speed 3251.72 samples/sec Loss 5.1480 LearningRate 0.0403 Epoch: 7 Global Step: 90640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:09,756-Speed 3354.85 samples/sec Loss 5.0312 LearningRate 0.0403 Epoch: 7 Global Step: 90650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:12,824-Speed 3338.19 samples/sec Loss 5.1091 LearningRate 0.0403 Epoch: 7 Global Step: 90660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:15,895-Speed 3335.74 samples/sec Loss 5.0870 LearningRate 0.0403 Epoch: 7 Global Step: 90670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:33:19,053-Speed 3244.04 samples/sec Loss 5.0952 LearningRate 0.0403 Epoch: 7 Global Step: 90680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:33:22,104-Speed 3356.67 samples/sec Loss 5.1019 LearningRate 0.0403 Epoch: 7 Global Step: 90690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:33:25,248-Speed 3258.21 samples/sec Loss 5.1427 LearningRate 0.0403 Epoch: 7 Global Step: 90700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:33:28,337-Speed 3316.32 samples/sec Loss 5.1100 LearningRate 0.0403 Epoch: 7 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:33:31,440-Speed 3300.56 samples/sec Loss 5.0773 LearningRate 0.0403 Epoch: 7 Global Step: 90720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:33:34,497-Speed 3350.29 samples/sec Loss 5.1685 LearningRate 0.0403 Epoch: 7 Global Step: 90730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:37,625-Speed 3275.92 samples/sec Loss 5.1468 LearningRate 0.0403 Epoch: 7 Global Step: 90740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:40,763-Speed 3263.48 samples/sec Loss 5.1656 LearningRate 0.0403 Epoch: 7 Global Step: 90750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:43,901-Speed 3264.85 samples/sec Loss 5.0812 LearningRate 0.0403 Epoch: 7 Global Step: 90760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:47,011-Speed 3293.37 samples/sec Loss 4.9717 LearningRate 0.0403 Epoch: 7 Global Step: 90770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:50,087-Speed 3329.58 samples/sec Loss 5.1530 LearningRate 0.0403 Epoch: 7 Global Step: 90780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:53,212-Speed 3277.82 samples/sec Loss 5.0950 LearningRate 0.0403 Epoch: 7 Global Step: 90790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:56,338-Speed 3276.30 samples/sec Loss 5.0863 LearningRate 0.0403 Epoch: 7 Global Step: 90800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:33:59,412-Speed 3332.86 samples/sec Loss 5.1040 LearningRate 0.0403 Epoch: 7 Global Step: 90810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:02,577-Speed 3235.75 samples/sec Loss 5.0897 LearningRate 0.0402 Epoch: 7 Global Step: 90820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:05,661-Speed 3321.54 samples/sec Loss 5.0556 LearningRate 0.0402 Epoch: 7 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:34:08,756-Speed 3310.05 samples/sec Loss 5.1794 LearningRate 0.0402 Epoch: 7 Global Step: 90840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:11,865-Speed 3294.37 samples/sec Loss 5.0677 LearningRate 0.0402 Epoch: 7 Global Step: 90850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:14,988-Speed 3279.98 samples/sec Loss 5.1259 LearningRate 0.0402 Epoch: 7 Global Step: 90860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:18,062-Speed 3332.65 samples/sec Loss 5.0644 LearningRate 0.0402 Epoch: 7 Global Step: 90870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:21,145-Speed 3322.15 samples/sec Loss 5.0665 LearningRate 0.0402 Epoch: 7 Global Step: 90880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:24,261-Speed 3287.31 samples/sec Loss 5.1304 LearningRate 0.0402 Epoch: 7 Global Step: 90890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:27,423-Speed 3239.40 samples/sec Loss 5.0744 LearningRate 0.0402 Epoch: 7 Global Step: 90900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:30,493-Speed 3336.13 samples/sec Loss 5.1497 LearningRate 0.0402 Epoch: 7 Global Step: 90910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:33,566-Speed 3333.76 samples/sec Loss 5.1504 LearningRate 0.0402 Epoch: 7 Global Step: 90920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:36,660-Speed 3311.06 samples/sec Loss 5.0414 LearningRate 0.0402 Epoch: 7 Global Step: 90930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 09:34:39,775-Speed 3288.62 samples/sec Loss 5.2194 LearningRate 0.0402 Epoch: 7 Global Step: 90940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 09:34:42,836-Speed 3346.22 samples/sec Loss 5.1038 LearningRate 0.0402 Epoch: 7 Global Step: 90950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:34:45,936-Speed 3303.69 samples/sec Loss 5.1934 LearningRate 0.0402 Epoch: 7 Global Step: 90960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:34:49,019-Speed 3322.61 samples/sec Loss 5.1042 LearningRate 0.0402 Epoch: 7 Global Step: 90970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:34:52,197-Speed 3223.49 samples/sec Loss 5.0785 LearningRate 0.0402 Epoch: 7 Global Step: 90980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:34:55,301-Speed 3299.73 samples/sec Loss 5.2221 LearningRate 0.0402 Epoch: 7 Global Step: 90990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:34:58,383-Speed 3323.33 samples/sec Loss 5.1319 LearningRate 0.0402 Epoch: 7 Global Step: 91000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:01,487-Speed 3300.65 samples/sec Loss 5.0167 LearningRate 0.0402 Epoch: 7 Global Step: 91010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:04,613-Speed 3276.32 samples/sec Loss 5.1239 LearningRate 0.0401 Epoch: 7 Global Step: 91020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:07,721-Speed 3296.18 samples/sec Loss 5.1612 LearningRate 0.0401 Epoch: 7 Global Step: 91030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:10,806-Speed 3319.83 samples/sec Loss 5.1606 LearningRate 0.0401 Epoch: 7 Global Step: 91040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:13,978-Speed 3229.68 samples/sec Loss 5.1656 LearningRate 0.0401 Epoch: 7 Global Step: 91050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:35:17,147-Speed 3232.20 samples/sec Loss 5.0767 LearningRate 0.0401 Epoch: 7 Global Step: 91060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:35:20,265-Speed 3285.81 samples/sec Loss 5.2210 LearningRate 0.0401 Epoch: 7 Global Step: 91070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:35:23,426-Speed 3240.49 samples/sec Loss 5.1346 LearningRate 0.0401 Epoch: 7 Global Step: 91080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:35:26,534-Speed 3295.22 samples/sec Loss 5.0801 LearningRate 0.0401 Epoch: 7 Global Step: 91090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:29,609-Speed 3331.60 samples/sec Loss 5.0568 LearningRate 0.0401 Epoch: 7 Global Step: 91100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:32,741-Speed 3270.70 samples/sec Loss 5.1189 LearningRate 0.0401 Epoch: 7 Global Step: 91110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:35,910-Speed 3232.42 samples/sec Loss 5.1265 LearningRate 0.0401 Epoch: 7 Global Step: 91120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:39,060-Speed 3251.87 samples/sec Loss 5.0421 LearningRate 0.0401 Epoch: 7 Global Step: 91130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:42,184-Speed 3278.98 samples/sec Loss 5.1294 LearningRate 0.0401 Epoch: 7 Global Step: 91140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:45,277-Speed 3311.55 samples/sec Loss 5.1871 LearningRate 0.0401 Epoch: 7 Global Step: 91150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:48,443-Speed 3235.15 samples/sec Loss 5.1489 LearningRate 0.0401 Epoch: 7 Global Step: 91160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:51,570-Speed 3275.20 samples/sec Loss 5.1316 LearningRate 0.0401 Epoch: 7 Global Step: 91170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:54,730-Speed 3241.83 samples/sec Loss 5.1217 LearningRate 0.0401 Epoch: 7 Global Step: 91180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:35:57,848-Speed 3285.75 samples/sec Loss 5.1403 LearningRate 0.0401 Epoch: 7 Global Step: 91190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:36:00,929-Speed 3324.48 samples/sec Loss 5.0646 LearningRate 0.0401 Epoch: 7 Global Step: 91200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:04,045-Speed 3286.94 samples/sec Loss 5.1312 LearningRate 0.0400 Epoch: 7 Global Step: 91210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:07,169-Speed 3278.64 samples/sec Loss 5.1656 LearningRate 0.0400 Epoch: 7 Global Step: 91220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:10,238-Speed 3338.06 samples/sec Loss 5.1498 LearningRate 0.0400 Epoch: 7 Global Step: 91230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:13,360-Speed 3281.11 samples/sec Loss 5.1172 LearningRate 0.0400 Epoch: 7 Global Step: 91240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:16,461-Speed 3303.05 samples/sec Loss 5.1109 LearningRate 0.0400 Epoch: 7 Global Step: 91250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:19,574-Speed 3291.06 samples/sec Loss 5.2195 LearningRate 0.0400 Epoch: 7 Global Step: 91260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:22,664-Speed 3315.02 samples/sec Loss 5.0758 LearningRate 0.0400 Epoch: 7 Global Step: 91270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:25,868-Speed 3196.71 samples/sec Loss 5.1777 LearningRate 0.0400 Epoch: 7 Global Step: 91280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:28,994-Speed 3276.73 samples/sec Loss 5.1528 LearningRate 0.0400 Epoch: 7 Global Step: 91290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:32,092-Speed 3306.25 samples/sec Loss 5.1229 LearningRate 0.0400 Epoch: 7 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:36:35,273-Speed 3220.05 samples/sec Loss 5.1697 LearningRate 0.0400 Epoch: 7 Global Step: 91310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:38,400-Speed 3276.33 samples/sec Loss 5.1947 LearningRate 0.0400 Epoch: 7 Global Step: 91320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:41,527-Speed 3275.50 samples/sec Loss 5.1709 LearningRate 0.0400 Epoch: 7 Global Step: 91330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:44,594-Speed 3339.39 samples/sec Loss 5.1130 LearningRate 0.0400 Epoch: 7 Global Step: 91340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:47,670-Speed 3330.62 samples/sec Loss 5.2324 LearningRate 0.0400 Epoch: 7 Global Step: 91350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:50,818-Speed 3253.59 samples/sec Loss 5.1663 LearningRate 0.0400 Epoch: 7 Global Step: 91360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:53,956-Speed 3264.22 samples/sec Loss 5.2204 LearningRate 0.0400 Epoch: 7 Global Step: 91370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:36:57,013-Speed 3350.80 samples/sec Loss 5.0491 LearningRate 0.0400 Epoch: 7 Global Step: 91380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:00,162-Speed 3253.05 samples/sec Loss 5.1848 LearningRate 0.0400 Epoch: 7 Global Step: 91390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:03,320-Speed 3243.58 samples/sec Loss 5.1357 LearningRate 0.0400 Epoch: 7 Global Step: 91400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:06,476-Speed 3245.05 samples/sec Loss 5.1994 LearningRate 0.0399 Epoch: 7 Global Step: 91410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:09,555-Speed 3327.58 samples/sec Loss 5.0363 LearningRate 0.0399 Epoch: 7 Global Step: 91420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:12,667-Speed 3291.16 samples/sec Loss 5.1574 LearningRate 0.0399 Epoch: 7 Global Step: 91430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:15,782-Speed 3288.01 samples/sec Loss 5.1301 LearningRate 0.0399 Epoch: 7 Global Step: 91440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:18,904-Speed 3281.81 samples/sec Loss 5.1524 LearningRate 0.0399 Epoch: 7 Global Step: 91450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:22,016-Speed 3291.14 samples/sec Loss 5.1995 LearningRate 0.0399 Epoch: 7 Global Step: 91460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:25,116-Speed 3304.39 samples/sec Loss 5.2130 LearningRate 0.0399 Epoch: 7 Global Step: 91470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:28,176-Speed 3347.04 samples/sec Loss 5.1269 LearningRate 0.0399 Epoch: 7 Global Step: 91480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:31,290-Speed 3289.48 samples/sec Loss 5.1688 LearningRate 0.0399 Epoch: 7 Global Step: 91490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:34,340-Speed 3358.49 samples/sec Loss 5.2279 LearningRate 0.0399 Epoch: 7 Global Step: 91500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:37,499-Speed 3242.53 samples/sec Loss 5.1333 LearningRate 0.0399 Epoch: 7 Global Step: 91510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:40,687-Speed 3213.49 samples/sec Loss 5.1212 LearningRate 0.0399 Epoch: 7 Global Step: 91520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:43,834-Speed 3255.12 samples/sec Loss 5.1567 LearningRate 0.0399 Epoch: 7 Global Step: 91530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:46,893-Speed 3348.13 samples/sec Loss 5.1822 LearningRate 0.0399 Epoch: 7 Global Step: 91540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:49,995-Speed 3301.80 samples/sec Loss 5.1212 LearningRate 0.0399 Epoch: 7 Global Step: 91550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:53,103-Speed 3296.65 samples/sec Loss 5.2512 LearningRate 0.0399 Epoch: 7 Global Step: 91560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:56,175-Speed 3334.22 samples/sec Loss 5.0713 LearningRate 0.0399 Epoch: 7 Global Step: 91570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:37:59,241-Speed 3340.62 samples/sec Loss 5.2154 LearningRate 0.0399 Epoch: 7 Global Step: 91580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:38:02,339-Speed 3307.02 samples/sec Loss 5.2198 LearningRate 0.0399 Epoch: 7 Global Step: 91590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:38:05,418-Speed 3326.85 samples/sec Loss 5.2643 LearningRate 0.0399 Epoch: 7 Global Step: 91600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:38:08,489-Speed 3334.37 samples/sec Loss 5.1547 LearningRate 0.0398 Epoch: 7 Global Step: 91610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:38:11,602-Speed 3290.70 samples/sec Loss 5.2058 LearningRate 0.0398 Epoch: 7 Global Step: 91620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:38:14,719-Speed 3286.45 samples/sec Loss 5.0965 LearningRate 0.0398 Epoch: 7 Global Step: 91630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:38:17,940-Speed 3180.46 samples/sec Loss 5.1662 LearningRate 0.0398 Epoch: 7 Global Step: 91640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:38:21,017-Speed 3329.00 samples/sec Loss 5.2706 LearningRate 0.0398 Epoch: 7 Global Step: 91650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:38:24,103-Speed 3319.40 samples/sec Loss 5.2031 LearningRate 0.0398 Epoch: 7 Global Step: 91660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:38:27,205-Speed 3301.77 samples/sec Loss 5.1307 LearningRate 0.0398 Epoch: 7 Global Step: 91670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:38:30,327-Speed 3280.90 samples/sec Loss 5.0511 LearningRate 0.0398 Epoch: 7 Global Step: 91680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:38:33,383-Speed 3351.94 samples/sec Loss 5.2124 LearningRate 0.0398 Epoch: 7 Global Step: 91690 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:38:36,500-Speed 3286.51 samples/sec Loss 5.1556 LearningRate 0.0398 Epoch: 7 Global Step: 91700 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:38:39,616-Speed 3287.10 samples/sec Loss 5.1049 LearningRate 0.0398 Epoch: 7 Global Step: 91710 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:38:42,766-Speed 3251.26 samples/sec Loss 5.1792 LearningRate 0.0398 Epoch: 7 Global Step: 91720 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:38:45,874-Speed 3295.78 samples/sec Loss 5.1084 LearningRate 0.0398 Epoch: 7 Global Step: 91730 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:38:48,947-Speed 3334.12 samples/sec Loss 5.2362 LearningRate 0.0398 Epoch: 7 Global Step: 91740 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:38:52,047-Speed 3304.22 samples/sec Loss 5.3057 LearningRate 0.0398 Epoch: 7 Global Step: 91750 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:38:55,099-Speed 3355.98 samples/sec Loss 5.1200 LearningRate 0.0398 Epoch: 7 Global Step: 91760 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:38:58,151-Speed 3356.79 samples/sec Loss 5.2913 LearningRate 0.0398 Epoch: 7 Global Step: 91770 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:39:01,216-Speed 3342.10 samples/sec Loss 5.1883 LearningRate 0.0398 Epoch: 7 Global Step: 91780 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:39:04,303-Speed 3318.08 samples/sec Loss 5.2738 LearningRate 0.0398 Epoch: 7 Global Step: 91790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:07,422-Speed 3284.19 samples/sec Loss 5.2285 LearningRate 0.0397 Epoch: 7 Global Step: 91800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:10,518-Speed 3308.14 samples/sec Loss 5.1152 LearningRate 0.0397 Epoch: 7 Global Step: 91810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:13,649-Speed 3272.75 samples/sec Loss 5.2643 LearningRate 0.0397 Epoch: 7 Global Step: 91820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:16,727-Speed 3327.11 samples/sec Loss 5.1512 LearningRate 0.0397 Epoch: 7 Global Step: 91830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:19,832-Speed 3299.86 samples/sec Loss 5.1970 LearningRate 0.0397 Epoch: 7 Global Step: 91840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:22,914-Speed 3323.02 samples/sec Loss 5.1513 LearningRate 0.0397 Epoch: 7 Global Step: 91850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:26,010-Speed 3308.69 samples/sec Loss 5.1243 LearningRate 0.0397 Epoch: 7 Global Step: 91860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:29,134-Speed 3278.61 samples/sec Loss 5.1731 LearningRate 0.0397 Epoch: 7 Global Step: 91870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:32,271-Speed 3265.30 samples/sec Loss 5.1647 LearningRate 0.0397 Epoch: 7 Global Step: 91880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:35,447-Speed 3226.01 samples/sec Loss 5.2174 LearningRate 0.0397 Epoch: 7 Global Step: 91890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:39:38,614-Speed 3233.43 samples/sec Loss 5.1928 LearningRate 0.0397 Epoch: 7 Global Step: 91900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:41,733-Speed 3284.52 samples/sec Loss 5.1771 LearningRate 0.0397 Epoch: 7 Global Step: 91910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:44,855-Speed 3280.47 samples/sec Loss 5.1330 LearningRate 0.0397 Epoch: 7 Global Step: 91920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:47,971-Speed 3288.06 samples/sec Loss 5.1049 LearningRate 0.0397 Epoch: 7 Global Step: 91930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:51,123-Speed 3248.97 samples/sec Loss 5.2060 LearningRate 0.0397 Epoch: 7 Global Step: 91940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:54,260-Speed 3265.19 samples/sec Loss 5.1951 LearningRate 0.0397 Epoch: 7 Global Step: 91950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:39:57,349-Speed 3316.52 samples/sec Loss 5.1056 LearningRate 0.0397 Epoch: 7 Global Step: 91960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:40:00,432-Speed 3321.90 samples/sec Loss 5.1901 LearningRate 0.0397 Epoch: 7 Global Step: 91970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:40:03,540-Speed 3295.90 samples/sec Loss 5.2598 LearningRate 0.0397 Epoch: 7 Global Step: 91980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:40:06,643-Speed 3301.21 samples/sec Loss 5.2331 LearningRate 0.0397 Epoch: 7 Global Step: 91990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:40:09,757-Speed 3288.77 samples/sec Loss 5.1878 LearningRate 0.0396 Epoch: 7 Global Step: 92000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:40:12,876-Speed 3284.48 samples/sec Loss 5.1388 LearningRate 0.0396 Epoch: 7 Global Step: 92010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:40:15,983-Speed 3296.42 samples/sec Loss 5.2059 LearningRate 0.0396 Epoch: 7 Global Step: 92020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:40:19,083-Speed 3304.73 samples/sec Loss 5.2867 LearningRate 0.0396 Epoch: 7 Global Step: 92030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:40:22,146-Speed 3344.09 samples/sec Loss 5.1794 LearningRate 0.0396 Epoch: 7 Global Step: 92040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:40:25,237-Speed 3313.89 samples/sec Loss 5.2917 LearningRate 0.0396 Epoch: 7 Global Step: 92050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:40:28,376-Speed 3262.53 samples/sec Loss 5.3223 LearningRate 0.0396 Epoch: 7 Global Step: 92060 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:40:31,545-Speed 3233.06 samples/sec Loss 5.2165 LearningRate 0.0396 Epoch: 7 Global Step: 92070 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:40:34,650-Speed 3298.79 samples/sec Loss 5.2611 LearningRate 0.0396 Epoch: 7 Global Step: 92080 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:40:37,763-Speed 3290.86 samples/sec Loss 5.1992 LearningRate 0.0396 Epoch: 7 Global Step: 92090 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:40:40,867-Speed 3299.62 samples/sec Loss 5.2102 LearningRate 0.0396 Epoch: 7 Global Step: 92100 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:40:43,946-Speed 3326.81 samples/sec Loss 5.0381 LearningRate 0.0396 Epoch: 7 Global Step: 92110 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:40:47,021-Speed 3331.78 samples/sec Loss 5.1864 LearningRate 0.0396 Epoch: 7 Global Step: 92120 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:40:50,186-Speed 3235.98 samples/sec Loss 5.2172 LearningRate 0.0396 Epoch: 7 Global Step: 92130 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:40:53,339-Speed 3248.35 samples/sec Loss 5.1539 LearningRate 0.0396 Epoch: 7 Global Step: 92140 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:40:56,449-Speed 3294.31 samples/sec Loss 5.2418 LearningRate 0.0396 Epoch: 7 Global Step: 92150 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:40:59,566-Speed 3286.20 samples/sec Loss 5.2036 LearningRate 0.0396 Epoch: 7 Global Step: 92160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:41:02,671-Speed 3298.68 samples/sec Loss 5.2287 LearningRate 0.0396 Epoch: 7 Global Step: 92170 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:41:05,720-Speed 3359.68 samples/sec Loss 5.1514 LearningRate 0.0396 Epoch: 7 Global Step: 92180 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:41:08,820-Speed 3304.79 samples/sec Loss 5.1725 LearningRate 0.0396 Epoch: 7 Global Step: 92190 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:41:11,929-Speed 3294.94 samples/sec Loss 5.1850 LearningRate 0.0395 Epoch: 7 Global Step: 92200 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:41:15,049-Speed 3283.46 samples/sec Loss 5.1668 LearningRate 0.0395 Epoch: 7 Global Step: 92210 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:41:18,156-Speed 3296.14 samples/sec Loss 5.2342 LearningRate 0.0395 Epoch: 7 Global Step: 92220 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:41:21,224-Speed 3339.08 samples/sec Loss 5.1222 LearningRate 0.0395 Epoch: 7 Global Step: 92230 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:41:24,325-Speed 3303.14 samples/sec Loss 5.1958 LearningRate 0.0395 Epoch: 7 Global Step: 92240 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:41:27,460-Speed 3267.57 samples/sec Loss 5.2282 LearningRate 0.0395 Epoch: 7 Global Step: 92250 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:41:30,548-Speed 3316.77 samples/sec Loss 5.2378 LearningRate 0.0395 Epoch: 7 Global Step: 92260 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:41:33,661-Speed 3291.26 samples/sec Loss 5.1734 LearningRate 0.0395 Epoch: 7 Global Step: 92270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:41:36,848-Speed 3213.33 samples/sec Loss 5.1328 LearningRate 0.0395 Epoch: 7 Global Step: 92280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:41:39,966-Speed 3285.30 samples/sec Loss 5.1820 LearningRate 0.0395 Epoch: 7 Global Step: 92290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:41:43,060-Speed 3310.79 samples/sec Loss 5.1532 LearningRate 0.0395 Epoch: 7 Global Step: 92300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:41:46,149-Speed 3316.27 samples/sec Loss 5.1475 LearningRate 0.0395 Epoch: 7 Global Step: 92310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:41:49,211-Speed 3345.51 samples/sec Loss 5.1836 LearningRate 0.0395 Epoch: 7 Global Step: 92320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:41:52,327-Speed 3287.09 samples/sec Loss 5.2076 LearningRate 0.0395 Epoch: 7 Global Step: 92330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:41:55,532-Speed 3195.97 samples/sec Loss 5.2995 LearningRate 0.0395 Epoch: 7 Global Step: 92340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:41:58,655-Speed 3280.36 samples/sec Loss 5.3175 LearningRate 0.0395 Epoch: 7 Global Step: 92350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:42:01,762-Speed 3296.31 samples/sec Loss 5.1216 LearningRate 0.0395 Epoch: 7 Global Step: 92360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:42:04,975-Speed 3188.66 samples/sec Loss 5.2941 LearningRate 0.0395 Epoch: 7 Global Step: 92370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:08,112-Speed 3264.55 samples/sec Loss 5.3962 LearningRate 0.0395 Epoch: 7 Global Step: 92380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:11,207-Speed 3309.96 samples/sec Loss 5.2405 LearningRate 0.0394 Epoch: 7 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:14,307-Speed 3305.04 samples/sec Loss 5.2195 LearningRate 0.0394 Epoch: 7 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:17,395-Speed 3316.62 samples/sec Loss 5.3540 LearningRate 0.0394 Epoch: 7 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:20,466-Speed 3335.86 samples/sec Loss 5.1985 LearningRate 0.0394 Epoch: 7 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:23,569-Speed 3301.61 samples/sec Loss 5.2509 LearningRate 0.0394 Epoch: 7 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:26,631-Speed 3344.59 samples/sec Loss 5.2192 LearningRate 0.0394 Epoch: 7 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:29,742-Speed 3292.55 samples/sec Loss 5.1793 LearningRate 0.0394 Epoch: 7 Global Step: 92450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:32,797-Speed 3353.52 samples/sec Loss 5.3144 LearningRate 0.0394 Epoch: 7 Global Step: 92460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:35,909-Speed 3291.65 samples/sec Loss 5.2310 LearningRate 0.0394 Epoch: 7 Global Step: 92470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 09:42:39,080-Speed 3230.24 samples/sec Loss 5.2383 LearningRate 0.0394 Epoch: 7 Global Step: 92480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:42,195-Speed 3287.85 samples/sec Loss 5.1607 LearningRate 0.0394 Epoch: 7 Global Step: 92490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:42:45,240-Speed 3364.18 samples/sec Loss 5.2420 LearningRate 0.0394 Epoch: 7 Global Step: 92500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:42:48,357-Speed 3286.41 samples/sec Loss 5.2084 LearningRate 0.0394 Epoch: 7 Global Step: 92510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:42:51,502-Speed 3256.79 samples/sec Loss 5.2270 LearningRate 0.0394 Epoch: 7 Global Step: 92520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:42:54,651-Speed 3253.28 samples/sec Loss 5.2016 LearningRate 0.0394 Epoch: 7 Global Step: 92530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:42:57,776-Speed 3278.35 samples/sec Loss 5.2539 LearningRate 0.0394 Epoch: 7 Global Step: 92540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:43:00,892-Speed 3287.07 samples/sec Loss 5.2764 LearningRate 0.0394 Epoch: 7 Global Step: 92550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:43:04,074-Speed 3218.72 samples/sec Loss 5.2869 LearningRate 0.0394 Epoch: 7 Global Step: 92560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:43:07,154-Speed 3326.58 samples/sec Loss 5.2690 LearningRate 0.0394 Epoch: 7 Global Step: 92570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:43:10,229-Speed 3331.19 samples/sec Loss 5.2674 LearningRate 0.0394 Epoch: 7 Global Step: 92580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:43:13,386-Speed 3244.42 samples/sec Loss 5.2443 LearningRate 0.0393 Epoch: 7 Global Step: 92590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:43:16,567-Speed 3220.15 samples/sec Loss 5.2753 LearningRate 0.0393 Epoch: 7 Global Step: 92600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:43:19,698-Speed 3270.91 samples/sec Loss 5.2005 LearningRate 0.0393 Epoch: 7 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:43:22,738-Speed 3369.84 samples/sec Loss 5.2721 LearningRate 0.0393 Epoch: 7 Global Step: 92620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:43:25,799-Speed 3346.66 samples/sec Loss 5.1099 LearningRate 0.0393 Epoch: 7 Global Step: 92630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:43:28,833-Speed 3375.66 samples/sec Loss 5.2237 LearningRate 0.0393 Epoch: 7 Global Step: 92640 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:43:31,915-Speed 3323.47 samples/sec Loss 5.1391 LearningRate 0.0393 Epoch: 7 Global Step: 92650 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:43:34,996-Speed 3325.49 samples/sec Loss 5.3233 LearningRate 0.0393 Epoch: 7 Global Step: 92660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:43:38,050-Speed 3354.08 samples/sec Loss 5.1361 LearningRate 0.0393 Epoch: 7 Global Step: 92670 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:43:41,130-Speed 3325.47 samples/sec Loss 5.2544 LearningRate 0.0393 Epoch: 7 Global Step: 92680 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:43:44,201-Speed 3335.68 samples/sec Loss 5.1475 LearningRate 0.0393 Epoch: 7 Global Step: 92690 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:43:47,275-Speed 3331.95 samples/sec Loss 5.2022 LearningRate 0.0393 Epoch: 7 Global Step: 92700 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:43:50,399-Speed 3278.82 samples/sec Loss 5.2326 LearningRate 0.0393 Epoch: 7 Global Step: 92710 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:43:53,511-Speed 3290.94 samples/sec Loss 5.2166 LearningRate 0.0393 Epoch: 7 Global Step: 92720 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:43:56,575-Speed 3343.99 samples/sec Loss 5.2011 LearningRate 0.0393 Epoch: 7 Global Step: 92730 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:43:59,650-Speed 3330.32 samples/sec Loss 5.2958 LearningRate 0.0393 Epoch: 7 Global Step: 92740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:02,736-Speed 3319.17 samples/sec Loss 5.2349 LearningRate 0.0393 Epoch: 7 Global Step: 92750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:05,865-Speed 3273.55 samples/sec Loss 5.3324 LearningRate 0.0393 Epoch: 7 Global Step: 92760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:08,949-Speed 3322.21 samples/sec Loss 5.2950 LearningRate 0.0393 Epoch: 7 Global Step: 92770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:12,081-Speed 3269.69 samples/sec Loss 5.2770 LearningRate 0.0393 Epoch: 7 Global Step: 92780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:15,307-Speed 3175.44 samples/sec Loss 5.2225 LearningRate 0.0392 Epoch: 7 Global Step: 92790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:18,448-Speed 3261.88 samples/sec Loss 5.2675 LearningRate 0.0392 Epoch: 7 Global Step: 92800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:21,534-Speed 3319.34 samples/sec Loss 5.2439 LearningRate 0.0392 Epoch: 7 Global Step: 92810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:24,696-Speed 3238.83 samples/sec Loss 5.2067 LearningRate 0.0392 Epoch: 7 Global Step: 92820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:27,766-Speed 3336.28 samples/sec Loss 5.1742 LearningRate 0.0392 Epoch: 7 Global Step: 92830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:30,800-Speed 3376.84 samples/sec Loss 5.1589 LearningRate 0.0392 Epoch: 7 Global Step: 92840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:33,861-Speed 3345.65 samples/sec Loss 5.3613 LearningRate 0.0392 Epoch: 7 Global Step: 92850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:36,959-Speed 3307.29 samples/sec Loss 5.2846 LearningRate 0.0392 Epoch: 7 Global Step: 92860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:40,011-Speed 3355.91 samples/sec Loss 5.2101 LearningRate 0.0392 Epoch: 7 Global Step: 92870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:43,153-Speed 3259.96 samples/sec Loss 5.2480 LearningRate 0.0392 Epoch: 7 Global Step: 92880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:46,258-Speed 3299.20 samples/sec Loss 5.3022 LearningRate 0.0392 Epoch: 7 Global Step: 92890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:49,368-Speed 3294.02 samples/sec Loss 5.2262 LearningRate 0.0392 Epoch: 7 Global Step: 92900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:52,473-Speed 3299.62 samples/sec Loss 5.2432 LearningRate 0.0392 Epoch: 7 Global Step: 92910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:55,607-Speed 3268.46 samples/sec Loss 5.2556 LearningRate 0.0392 Epoch: 7 Global Step: 92920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:44:58,730-Speed 3278.98 samples/sec Loss 5.2634 LearningRate 0.0392 Epoch: 7 Global Step: 92930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:45:01,795-Speed 3342.26 samples/sec Loss 5.3150 LearningRate 0.0392 Epoch: 7 Global Step: 92940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:45:04,876-Speed 3325.91 samples/sec Loss 5.2210 LearningRate 0.0392 Epoch: 7 Global Step: 92950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:45:07,935-Speed 3347.88 samples/sec Loss 5.1904 LearningRate 0.0392 Epoch: 7 Global Step: 92960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:45:11,026-Speed 3313.41 samples/sec Loss 5.1845 LearningRate 0.0392 Epoch: 7 Global Step: 92970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:45:14,272-Speed 3155.67 samples/sec Loss 5.3341 LearningRate 0.0392 Epoch: 7 Global Step: 92980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:45:17,336-Speed 3343.38 samples/sec Loss 5.2281 LearningRate 0.0391 Epoch: 7 Global Step: 92990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:45:20,389-Speed 3355.70 samples/sec Loss 5.3708 LearningRate 0.0391 Epoch: 7 Global Step: 93000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:45:23,492-Speed 3300.33 samples/sec Loss 5.2789 LearningRate 0.0391 Epoch: 7 Global Step: 93010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:45:26,571-Speed 3327.64 samples/sec Loss 5.2067 LearningRate 0.0391 Epoch: 7 Global Step: 93020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:45:29,662-Speed 3313.82 samples/sec Loss 5.2189 LearningRate 0.0391 Epoch: 7 Global Step: 93030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:45:32,696-Speed 3375.49 samples/sec Loss 5.3565 LearningRate 0.0391 Epoch: 7 Global Step: 93040 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:45:35,896-Speed 3201.56 samples/sec Loss 5.2597 LearningRate 0.0391 Epoch: 7 Global Step: 93050 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:45:39,018-Speed 3280.75 samples/sec Loss 5.2196 LearningRate 0.0391 Epoch: 7 Global Step: 93060 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:45:42,238-Speed 3180.91 samples/sec Loss 5.3035 LearningRate 0.0391 Epoch: 7 Global Step: 93070 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:45:45,316-Speed 3328.27 samples/sec Loss 5.3005 LearningRate 0.0391 Epoch: 7 Global Step: 93080 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:45:48,385-Speed 3337.42 samples/sec Loss 5.1287 LearningRate 0.0391 Epoch: 7 Global Step: 93090 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:45:51,515-Speed 3272.11 samples/sec Loss 5.2976 LearningRate 0.0391 Epoch: 7 Global Step: 93100 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:45:54,574-Speed 3348.63 samples/sec Loss 5.2163 LearningRate 0.0391 Epoch: 7 Global Step: 93110 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:45:57,656-Speed 3324.52 samples/sec Loss 5.1968 LearningRate 0.0391 Epoch: 7 Global Step: 93120 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:46:00,790-Speed 3267.65 samples/sec Loss 5.2382 LearningRate 0.0391 Epoch: 7 Global Step: 93130 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:46:03,854-Speed 3343.02 samples/sec Loss 5.2823 LearningRate 0.0391 Epoch: 7 Global Step: 93140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:06,906-Speed 3356.98 samples/sec Loss 5.3784 LearningRate 0.0391 Epoch: 7 Global Step: 93150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:09,985-Speed 3326.13 samples/sec Loss 5.3170 LearningRate 0.0391 Epoch: 7 Global Step: 93160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:13,166-Speed 3220.85 samples/sec Loss 5.1787 LearningRate 0.0391 Epoch: 7 Global Step: 93170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:16,316-Speed 3251.36 samples/sec Loss 5.2189 LearningRate 0.0391 Epoch: 7 Global Step: 93180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:19,422-Speed 3297.56 samples/sec Loss 5.2473 LearningRate 0.0390 Epoch: 7 Global Step: 93190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:22,601-Speed 3221.78 samples/sec Loss 5.1967 LearningRate 0.0390 Epoch: 7 Global Step: 93200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:25,712-Speed 3293.00 samples/sec Loss 5.2625 LearningRate 0.0390 Epoch: 7 Global Step: 93210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:28,842-Speed 3272.70 samples/sec Loss 5.3069 LearningRate 0.0390 Epoch: 7 Global Step: 93220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:31,914-Speed 3334.52 samples/sec Loss 5.2303 LearningRate 0.0390 Epoch: 7 Global Step: 93230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:35,077-Speed 3238.09 samples/sec Loss 5.2555 LearningRate 0.0390 Epoch: 7 Global Step: 93240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:46:38,212-Speed 3268.07 samples/sec Loss 5.3124 LearningRate 0.0390 Epoch: 7 Global Step: 93250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:41,318-Speed 3296.98 samples/sec Loss 5.1876 LearningRate 0.0390 Epoch: 7 Global Step: 93260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:44,368-Speed 3358.34 samples/sec Loss 5.2688 LearningRate 0.0390 Epoch: 7 Global Step: 93270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:47,454-Speed 3319.50 samples/sec Loss 5.3161 LearningRate 0.0390 Epoch: 7 Global Step: 93280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:50,570-Speed 3287.71 samples/sec Loss 5.1386 LearningRate 0.0390 Epoch: 7 Global Step: 93290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:53,661-Speed 3313.86 samples/sec Loss 5.1566 LearningRate 0.0390 Epoch: 7 Global Step: 93300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:56,731-Speed 3335.94 samples/sec Loss 5.2731 LearningRate 0.0390 Epoch: 7 Global Step: 93310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:46:59,822-Speed 3314.73 samples/sec Loss 5.2972 LearningRate 0.0390 Epoch: 7 Global Step: 93320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:47:02,906-Speed 3320.91 samples/sec Loss 5.1974 LearningRate 0.0390 Epoch: 7 Global Step: 93330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:47:06,004-Speed 3307.32 samples/sec Loss 5.2340 LearningRate 0.0390 Epoch: 7 Global Step: 93340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:47:09,065-Speed 3346.08 samples/sec Loss 5.3972 LearningRate 0.0390 Epoch: 7 Global Step: 93350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:47:12,161-Speed 3307.72 samples/sec Loss 5.2669 LearningRate 0.0390 Epoch: 7 Global Step: 93360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:47:15,297-Speed 3267.17 samples/sec Loss 5.2603 LearningRate 0.0390 Epoch: 7 Global Step: 93370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:47:18,369-Speed 3334.69 samples/sec Loss 5.2776 LearningRate 0.0390 Epoch: 7 Global Step: 93380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:47:21,450-Speed 3323.85 samples/sec Loss 5.2919 LearningRate 0.0389 Epoch: 7 Global Step: 93390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:47:24,538-Speed 3317.74 samples/sec Loss 5.2695 LearningRate 0.0389 Epoch: 7 Global Step: 93400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:47:27,632-Speed 3309.86 samples/sec Loss 5.2657 LearningRate 0.0389 Epoch: 7 Global Step: 93410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:47:30,727-Speed 3309.88 samples/sec Loss 5.1884 LearningRate 0.0389 Epoch: 7 Global Step: 93420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:47:33,800-Speed 3333.83 samples/sec Loss 5.2507 LearningRate 0.0389 Epoch: 7 Global Step: 93430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:47:36,945-Speed 3256.76 samples/sec Loss 5.3349 LearningRate 0.0389 Epoch: 7 Global Step: 93440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:47:40,027-Speed 3323.93 samples/sec Loss 5.2775 LearningRate 0.0389 Epoch: 7 Global Step: 93450 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:47:43,136-Speed 3294.56 samples/sec Loss 5.2842 LearningRate 0.0389 Epoch: 7 Global Step: 93460 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:47:46,206-Speed 3335.52 samples/sec Loss 5.2411 LearningRate 0.0389 Epoch: 7 Global Step: 93470 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:47:49,361-Speed 3246.85 samples/sec Loss 5.3006 LearningRate 0.0389 Epoch: 7 Global Step: 93480 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:47:52,440-Speed 3326.87 samples/sec Loss 5.2689 LearningRate 0.0389 Epoch: 7 Global Step: 93490 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:47:55,621-Speed 3220.48 samples/sec Loss 5.2660 LearningRate 0.0389 Epoch: 7 Global Step: 93500 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:47:58,670-Speed 3359.54 samples/sec Loss 5.3255 LearningRate 0.0389 Epoch: 7 Global Step: 93510 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:48:01,809-Speed 3263.61 samples/sec Loss 5.2422 LearningRate 0.0389 Epoch: 7 Global Step: 93520 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:48:04,870-Speed 3346.21 samples/sec Loss 5.2606 LearningRate 0.0389 Epoch: 7 Global Step: 93530 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:48:07,966-Speed 3308.44 samples/sec Loss 5.1625 LearningRate 0.0389 Epoch: 7 Global Step: 93540 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:48:11,035-Speed 3337.38 samples/sec Loss 5.2816 LearningRate 0.0389 Epoch: 7 Global Step: 93550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:48:14,111-Speed 3331.00 samples/sec Loss 5.3119 LearningRate 0.0389 Epoch: 7 Global Step: 93560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:48:17,227-Speed 3287.05 samples/sec Loss 5.2096 LearningRate 0.0389 Epoch: 7 Global Step: 93570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:48:20,323-Speed 3307.94 samples/sec Loss 5.2219 LearningRate 0.0389 Epoch: 7 Global Step: 93580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:48:23,464-Speed 3261.39 samples/sec Loss 5.2297 LearningRate 0.0388 Epoch: 7 Global Step: 93590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:48:26,669-Speed 3195.91 samples/sec Loss 5.2058 LearningRate 0.0388 Epoch: 7 Global Step: 93600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:48:29,904-Speed 3166.79 samples/sec Loss 5.2414 LearningRate 0.0388 Epoch: 7 Global Step: 93610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:48:32,998-Speed 3310.91 samples/sec Loss 5.3222 LearningRate 0.0388 Epoch: 7 Global Step: 93620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:48:36,113-Speed 3288.05 samples/sec Loss 5.2458 LearningRate 0.0388 Epoch: 7 Global Step: 93630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:48:39,187-Speed 3331.75 samples/sec Loss 5.2307 LearningRate 0.0388 Epoch: 7 Global Step: 93640 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:48:42,321-Speed 3268.46 samples/sec Loss 5.2504 LearningRate 0.0388 Epoch: 7 Global Step: 93650 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:48:45,380-Speed 3348.67 samples/sec Loss 5.2408 LearningRate 0.0388 Epoch: 7 Global Step: 93660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:48:48,465-Speed 3319.93 samples/sec Loss 5.3301 LearningRate 0.0388 Epoch: 7 Global Step: 93670 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:48:51,617-Speed 3250.46 samples/sec Loss 5.1941 LearningRate 0.0388 Epoch: 7 Global Step: 93680 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:48:54,670-Speed 3355.38 samples/sec Loss 5.3320 LearningRate 0.0388 Epoch: 7 Global Step: 93690 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:48:57,720-Speed 3357.59 samples/sec Loss 5.3112 LearningRate 0.0388 Epoch: 7 Global Step: 93700 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:49:00,841-Speed 3282.08 samples/sec Loss 5.2224 LearningRate 0.0388 Epoch: 7 Global Step: 93710 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:49:03,928-Speed 3318.53 samples/sec Loss 5.2087 LearningRate 0.0388 Epoch: 7 Global Step: 93720 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:49:07,022-Speed 3311.19 samples/sec Loss 5.3237 LearningRate 0.0388 Epoch: 7 Global Step: 93730 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:49:10,109-Speed 3319.26 samples/sec Loss 5.2742 LearningRate 0.0388 Epoch: 7 Global Step: 93740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:13,257-Speed 3253.86 samples/sec Loss 5.2470 LearningRate 0.0388 Epoch: 7 Global Step: 93750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:16,364-Speed 3296.21 samples/sec Loss 5.2719 LearningRate 0.0388 Epoch: 7 Global Step: 93760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:20,037-Speed 2788.68 samples/sec Loss 5.3905 LearningRate 0.0388 Epoch: 7 Global Step: 93770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:23,109-Speed 3334.17 samples/sec Loss 5.3238 LearningRate 0.0387 Epoch: 7 Global Step: 93780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:26,194-Speed 3320.70 samples/sec Loss 5.2909 LearningRate 0.0387 Epoch: 7 Global Step: 93790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:29,328-Speed 3268.27 samples/sec Loss 5.2638 LearningRate 0.0387 Epoch: 7 Global Step: 93800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:32,471-Speed 3258.76 samples/sec Loss 5.2672 LearningRate 0.0387 Epoch: 7 Global Step: 93810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:35,624-Speed 3249.73 samples/sec Loss 5.2688 LearningRate 0.0387 Epoch: 7 Global Step: 93820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:38,715-Speed 3314.00 samples/sec Loss 5.4024 LearningRate 0.0387 Epoch: 7 Global Step: 93830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:41,838-Speed 3278.97 samples/sec Loss 5.3043 LearningRate 0.0387 Epoch: 7 Global Step: 93840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:49:44,891-Speed 3355.11 samples/sec Loss 5.2996 LearningRate 0.0387 Epoch: 7 Global Step: 93850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:49:47,938-Speed 3362.19 samples/sec Loss 5.2195 LearningRate 0.0387 Epoch: 7 Global Step: 93860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:51,024-Speed 3319.92 samples/sec Loss 5.2473 LearningRate 0.0387 Epoch: 7 Global Step: 93870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:54,119-Speed 3309.22 samples/sec Loss 5.3038 LearningRate 0.0387 Epoch: 7 Global Step: 93880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:49:57,178-Speed 3348.67 samples/sec Loss 5.3243 LearningRate 0.0387 Epoch: 7 Global Step: 93890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:00,264-Speed 3319.23 samples/sec Loss 5.2588 LearningRate 0.0387 Epoch: 7 Global Step: 93900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:03,374-Speed 3293.61 samples/sec Loss 5.3385 LearningRate 0.0387 Epoch: 7 Global Step: 93910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:06,530-Speed 3246.08 samples/sec Loss 5.1690 LearningRate 0.0387 Epoch: 7 Global Step: 93920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:09,599-Speed 3336.62 samples/sec Loss 5.3014 LearningRate 0.0387 Epoch: 7 Global Step: 93930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:12,704-Speed 3299.89 samples/sec Loss 5.3195 LearningRate 0.0387 Epoch: 7 Global Step: 93940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:15,827-Speed 3280.12 samples/sec Loss 5.3088 LearningRate 0.0387 Epoch: 7 Global Step: 93950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:18,911-Speed 3321.46 samples/sec Loss 5.2535 LearningRate 0.0387 Epoch: 7 Global Step: 93960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:50:21,948-Speed 3372.26 samples/sec Loss 5.2812 LearningRate 0.0387 Epoch: 7 Global Step: 93970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:25,071-Speed 3280.52 samples/sec Loss 5.2488 LearningRate 0.0386 Epoch: 7 Global Step: 93980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:28,190-Speed 3282.98 samples/sec Loss 5.2013 LearningRate 0.0386 Epoch: 7 Global Step: 93990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:31,311-Speed 3282.20 samples/sec Loss 5.2444 LearningRate 0.0386 Epoch: 7 Global Step: 94000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:34,385-Speed 3332.87 samples/sec Loss 5.2944 LearningRate 0.0386 Epoch: 7 Global Step: 94010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:37,493-Speed 3295.77 samples/sec Loss 5.2277 LearningRate 0.0386 Epoch: 7 Global Step: 94020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:40,583-Speed 3314.98 samples/sec Loss 5.2931 LearningRate 0.0386 Epoch: 7 Global Step: 94030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:43,713-Speed 3271.99 samples/sec Loss 5.3293 LearningRate 0.0386 Epoch: 7 Global Step: 94040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:46,782-Speed 3338.15 samples/sec Loss 5.2117 LearningRate 0.0386 Epoch: 7 Global Step: 94050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:50,014-Speed 3168.97 samples/sec Loss 5.3529 LearningRate 0.0386 Epoch: 7 Global Step: 94060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:50:53,156-Speed 3259.91 samples/sec Loss 5.2922 LearningRate 0.0386 Epoch: 7 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:50:56,265-Speed 3294.29 samples/sec Loss 5.2823 LearningRate 0.0386 Epoch: 7 Global Step: 94080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:51:00,052-Speed 2704.99 samples/sec Loss 5.3072 LearningRate 0.0386 Epoch: 7 Global Step: 94090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:04,394-Speed 2359.23 samples/sec Loss 5.3447 LearningRate 0.0386 Epoch: 7 Global Step: 94100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:07,467-Speed 3333.22 samples/sec Loss 5.2170 LearningRate 0.0386 Epoch: 7 Global Step: 94110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:10,517-Speed 3358.12 samples/sec Loss 5.2997 LearningRate 0.0386 Epoch: 7 Global Step: 94120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:13,610-Speed 3312.21 samples/sec Loss 5.3168 LearningRate 0.0386 Epoch: 7 Global Step: 94130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:16,795-Speed 3215.43 samples/sec Loss 5.3361 LearningRate 0.0386 Epoch: 7 Global Step: 94140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:19,842-Speed 3361.54 samples/sec Loss 5.3529 LearningRate 0.0386 Epoch: 7 Global Step: 94150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:23,066-Speed 3178.04 samples/sec Loss 5.2719 LearningRate 0.0386 Epoch: 7 Global Step: 94160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:26,158-Speed 3312.95 samples/sec Loss 5.2899 LearningRate 0.0386 Epoch: 7 Global Step: 94170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:29,233-Speed 3330.81 samples/sec Loss 5.2488 LearningRate 0.0385 Epoch: 7 Global Step: 94180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:32,295-Speed 3344.91 samples/sec Loss 5.3521 LearningRate 0.0385 Epoch: 7 Global Step: 94190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:35,392-Speed 3307.56 samples/sec Loss 5.3014 LearningRate 0.0385 Epoch: 7 Global Step: 94200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:38,462-Speed 3336.89 samples/sec Loss 5.3717 LearningRate 0.0385 Epoch: 7 Global Step: 94210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:41,597-Speed 3267.45 samples/sec Loss 5.2852 LearningRate 0.0385 Epoch: 7 Global Step: 94220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:44,704-Speed 3297.22 samples/sec Loss 5.3599 LearningRate 0.0385 Epoch: 7 Global Step: 94230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:47,852-Speed 3253.56 samples/sec Loss 5.1929 LearningRate 0.0385 Epoch: 7 Global Step: 94240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:50,955-Speed 3300.55 samples/sec Loss 5.3142 LearningRate 0.0385 Epoch: 7 Global Step: 94250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:51:54,072-Speed 3286.02 samples/sec Loss 5.2626 LearningRate 0.0385 Epoch: 7 Global Step: 94260 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:51:57,156-Speed 3321.62 samples/sec Loss 5.3553 LearningRate 0.0385 Epoch: 7 Global Step: 94270 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:52:00,225-Speed 3337.28 samples/sec Loss 5.3309 LearningRate 0.0385 Epoch: 7 Global Step: 94280 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:52:03,323-Speed 3306.66 samples/sec Loss 5.2612 LearningRate 0.0385 Epoch: 7 Global Step: 94290 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:52:06,465-Speed 3260.51 samples/sec Loss 5.2778 LearningRate 0.0385 Epoch: 7 Global Step: 94300 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:52:09,558-Speed 3311.96 samples/sec Loss 5.2835 LearningRate 0.0385 Epoch: 7 Global Step: 94310 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:52:12,683-Speed 3277.71 samples/sec Loss 5.1968 LearningRate 0.0385 Epoch: 7 Global Step: 94320 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:52:15,785-Speed 3302.30 samples/sec Loss 5.3173 LearningRate 0.0385 Epoch: 7 Global Step: 94330 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:52:18,847-Speed 3346.34 samples/sec Loss 5.2345 LearningRate 0.0385 Epoch: 7 Global Step: 94340 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:52:21,917-Speed 3335.64 samples/sec Loss 5.2312 LearningRate 0.0385 Epoch: 7 Global Step: 94350 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:52:25,102-Speed 3216.01 samples/sec Loss 5.3619 LearningRate 0.0385 Epoch: 7 Global Step: 94360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:52:28,255-Speed 3249.18 samples/sec Loss 5.3102 LearningRate 0.0385 Epoch: 7 Global Step: 94370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:52:31,444-Speed 3211.84 samples/sec Loss 5.3131 LearningRate 0.0384 Epoch: 7 Global Step: 94380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:52:34,534-Speed 3315.57 samples/sec Loss 5.3031 LearningRate 0.0384 Epoch: 7 Global Step: 94390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:52:37,768-Speed 3167.24 samples/sec Loss 5.3783 LearningRate 0.0384 Epoch: 7 Global Step: 94400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:52:40,859-Speed 3313.26 samples/sec Loss 5.2793 LearningRate 0.0384 Epoch: 7 Global Step: 94410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:52:43,929-Speed 3337.31 samples/sec Loss 5.3241 LearningRate 0.0384 Epoch: 7 Global Step: 94420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:52:47,014-Speed 3320.02 samples/sec Loss 5.2571 LearningRate 0.0384 Epoch: 7 Global Step: 94430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:52:50,121-Speed 3296.65 samples/sec Loss 5.2803 LearningRate 0.0384 Epoch: 7 Global Step: 94440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:52:53,218-Speed 3307.75 samples/sec Loss 5.3512 LearningRate 0.0384 Epoch: 7 Global Step: 94450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:52:56,313-Speed 3310.15 samples/sec Loss 5.2968 LearningRate 0.0384 Epoch: 7 Global Step: 94460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:52:59,378-Speed 3341.66 samples/sec Loss 5.3387 LearningRate 0.0384 Epoch: 7 Global Step: 94470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:53:02,486-Speed 3296.37 samples/sec Loss 5.2665 LearningRate 0.0384 Epoch: 7 Global Step: 94480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:53:05,553-Speed 3339.10 samples/sec Loss 5.3515 LearningRate 0.0384 Epoch: 7 Global Step: 94490 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:53:08,609-Speed 3352.36 samples/sec Loss 5.3544 LearningRate 0.0384 Epoch: 7 Global Step: 94500 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:53:11,674-Speed 3341.61 samples/sec Loss 5.2598 LearningRate 0.0384 Epoch: 7 Global Step: 94510 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:53:14,740-Speed 3341.59 samples/sec Loss 5.2832 LearningRate 0.0384 Epoch: 7 Global Step: 94520 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:53:17,828-Speed 3316.80 samples/sec Loss 5.3307 LearningRate 0.0384 Epoch: 7 Global Step: 94530 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:53:20,883-Speed 3352.80 samples/sec Loss 5.3705 LearningRate 0.0384 Epoch: 7 Global Step: 94540 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:53:23,984-Speed 3303.71 samples/sec Loss 5.2566 LearningRate 0.0384 Epoch: 7 Global Step: 94550 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:53:27,063-Speed 3325.78 samples/sec Loss 5.2773 LearningRate 0.0384 Epoch: 7 Global Step: 94560 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:53:30,131-Speed 3339.60 samples/sec Loss 5.2385 LearningRate 0.0384 Epoch: 7 Global Step: 94570 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:53:33,179-Speed 3360.69 samples/sec Loss 5.3553 LearningRate 0.0384 Epoch: 7 Global Step: 94580 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:53:36,249-Speed 3336.58 samples/sec Loss 5.2584 LearningRate 0.0383 Epoch: 7 Global Step: 94590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:53:39,354-Speed 3297.84 samples/sec Loss 5.2399 LearningRate 0.0383 Epoch: 7 Global Step: 94600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:53:42,458-Speed 3300.77 samples/sec Loss 5.2780 LearningRate 0.0383 Epoch: 7 Global Step: 94610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:53:45,543-Speed 3319.91 samples/sec Loss 5.2683 LearningRate 0.0383 Epoch: 7 Global Step: 94620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:53:48,700-Speed 3244.74 samples/sec Loss 5.2949 LearningRate 0.0383 Epoch: 7 Global Step: 94630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:53:51,842-Speed 3260.12 samples/sec Loss 5.2902 LearningRate 0.0383 Epoch: 7 Global Step: 94640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:53:54,955-Speed 3290.71 samples/sec Loss 5.3884 LearningRate 0.0383 Epoch: 7 Global Step: 94650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:53:57,997-Speed 3366.94 samples/sec Loss 5.3986 LearningRate 0.0383 Epoch: 7 Global Step: 94660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:54:01,051-Speed 3353.74 samples/sec Loss 5.3393 LearningRate 0.0383 Epoch: 7 Global Step: 94670 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:54:04,212-Speed 3240.58 samples/sec Loss 5.2445 LearningRate 0.0383 Epoch: 7 Global Step: 94680 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:54:07,333-Speed 3282.28 samples/sec Loss 5.3450 LearningRate 0.0383 Epoch: 7 Global Step: 94690 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:54:10,409-Speed 3330.09 samples/sec Loss 5.3058 LearningRate 0.0383 Epoch: 7 Global Step: 94700 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:54:13,528-Speed 3283.86 samples/sec Loss 5.3888 LearningRate 0.0383 Epoch: 7 Global Step: 94710 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:54:16,627-Speed 3305.76 samples/sec Loss 5.3723 LearningRate 0.0383 Epoch: 7 Global Step: 94720 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:54:19,690-Speed 3344.43 samples/sec Loss 5.2192 LearningRate 0.0383 Epoch: 7 Global Step: 94730 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:54:22,756-Speed 3340.12 samples/sec Loss 5.2675 LearningRate 0.0383 Epoch: 7 Global Step: 94740 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:54:25,881-Speed 3278.66 samples/sec Loss 5.2806 LearningRate 0.0383 Epoch: 7 Global Step: 94750 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:54:28,982-Speed 3303.27 samples/sec Loss 5.3768 LearningRate 0.0383 Epoch: 7 Global Step: 94760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:54:32,140-Speed 3243.63 samples/sec Loss 5.3169 LearningRate 0.0383 Epoch: 7 Global Step: 94770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:54:35,203-Speed 3343.89 samples/sec Loss 5.3923 LearningRate 0.0383 Epoch: 7 Global Step: 94780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:54:38,262-Speed 3348.33 samples/sec Loss 5.4290 LearningRate 0.0382 Epoch: 7 Global Step: 94790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:54:41,338-Speed 3330.92 samples/sec Loss 5.3360 LearningRate 0.0382 Epoch: 7 Global Step: 94800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:54:44,461-Speed 3279.35 samples/sec Loss 5.2637 LearningRate 0.0382 Epoch: 7 Global Step: 94810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:54:47,546-Speed 3320.98 samples/sec Loss 5.2676 LearningRate 0.0382 Epoch: 7 Global Step: 94820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:54:50,686-Speed 3261.82 samples/sec Loss 5.2464 LearningRate 0.0382 Epoch: 7 Global Step: 94830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:54:53,828-Speed 3259.58 samples/sec Loss 5.2139 LearningRate 0.0382 Epoch: 7 Global Step: 94840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:54:56,942-Speed 3289.59 samples/sec Loss 5.2390 LearningRate 0.0382 Epoch: 7 Global Step: 94850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:00,065-Speed 3280.21 samples/sec Loss 5.2962 LearningRate 0.0382 Epoch: 7 Global Step: 94860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:03,236-Speed 3230.38 samples/sec Loss 5.3366 LearningRate 0.0382 Epoch: 7 Global Step: 94870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:06,312-Speed 3329.85 samples/sec Loss 5.3136 LearningRate 0.0382 Epoch: 7 Global Step: 94880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:09,412-Speed 3304.45 samples/sec Loss 5.2700 LearningRate 0.0382 Epoch: 7 Global Step: 94890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:12,537-Speed 3277.21 samples/sec Loss 5.4052 LearningRate 0.0382 Epoch: 7 Global Step: 94900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:15,653-Speed 3287.91 samples/sec Loss 5.3301 LearningRate 0.0382 Epoch: 7 Global Step: 94910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:18,774-Speed 3281.70 samples/sec Loss 5.3563 LearningRate 0.0382 Epoch: 7 Global Step: 94920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:21,832-Speed 3349.62 samples/sec Loss 5.3137 LearningRate 0.0382 Epoch: 7 Global Step: 94930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:24,999-Speed 3234.44 samples/sec Loss 5.2372 LearningRate 0.0382 Epoch: 7 Global Step: 94940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:28,159-Speed 3241.29 samples/sec Loss 5.3211 LearningRate 0.0382 Epoch: 7 Global Step: 94950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:31,266-Speed 3297.45 samples/sec Loss 5.3778 LearningRate 0.0382 Epoch: 7 Global Step: 94960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:55:34,425-Speed 3242.47 samples/sec Loss 5.3114 LearningRate 0.0382 Epoch: 7 Global Step: 94970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:55:37,596-Speed 3230.26 samples/sec Loss 5.3201 LearningRate 0.0382 Epoch: 7 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:55:40,729-Speed 3269.45 samples/sec Loss 5.4498 LearningRate 0.0381 Epoch: 7 Global Step: 94990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:43,838-Speed 3294.15 samples/sec Loss 5.2309 LearningRate 0.0381 Epoch: 7 Global Step: 95000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:46,934-Speed 3308.92 samples/sec Loss 5.3325 LearningRate 0.0381 Epoch: 7 Global Step: 95010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:50,081-Speed 3254.38 samples/sec Loss 5.3044 LearningRate 0.0381 Epoch: 7 Global Step: 95020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:53,250-Speed 3232.89 samples/sec Loss 5.3002 LearningRate 0.0381 Epoch: 7 Global Step: 95030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:56,378-Speed 3274.16 samples/sec Loss 5.3455 LearningRate 0.0381 Epoch: 7 Global Step: 95040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:55:59,463-Speed 3320.47 samples/sec Loss 5.2956 LearningRate 0.0381 Epoch: 7 Global Step: 95050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:02,539-Speed 3330.32 samples/sec Loss 5.2041 LearningRate 0.0381 Epoch: 7 Global Step: 95060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:05,653-Speed 3289.53 samples/sec Loss 5.3399 LearningRate 0.0381 Epoch: 7 Global Step: 95070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:08,749-Speed 3308.46 samples/sec Loss 5.3152 LearningRate 0.0381 Epoch: 7 Global Step: 95080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:11,812-Speed 3344.46 samples/sec Loss 5.2589 LearningRate 0.0381 Epoch: 7 Global Step: 95090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:56:14,907-Speed 3309.41 samples/sec Loss 5.2511 LearningRate 0.0381 Epoch: 7 Global Step: 95100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:56:17,986-Speed 3327.47 samples/sec Loss 5.2677 LearningRate 0.0381 Epoch: 7 Global Step: 95110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:56:21,024-Speed 3371.17 samples/sec Loss 5.3847 LearningRate 0.0381 Epoch: 7 Global Step: 95120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:24,081-Speed 3351.35 samples/sec Loss 5.2781 LearningRate 0.0381 Epoch: 7 Global Step: 95130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:27,157-Speed 3330.08 samples/sec Loss 5.3654 LearningRate 0.0381 Epoch: 7 Global Step: 95140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:30,243-Speed 3318.70 samples/sec Loss 5.2750 LearningRate 0.0381 Epoch: 7 Global Step: 95150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:33,351-Speed 3295.26 samples/sec Loss 5.3445 LearningRate 0.0381 Epoch: 7 Global Step: 95160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:36,466-Speed 3289.36 samples/sec Loss 5.3284 LearningRate 0.0381 Epoch: 7 Global Step: 95170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:39,576-Speed 3292.88 samples/sec Loss 5.4468 LearningRate 0.0381 Epoch: 7 Global Step: 95180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:42,671-Speed 3309.77 samples/sec Loss 5.3551 LearningRate 0.0380 Epoch: 7 Global Step: 95190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:45,775-Speed 3300.49 samples/sec Loss 5.2534 LearningRate 0.0380 Epoch: 7 Global Step: 95200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:48,853-Speed 3327.50 samples/sec Loss 5.2908 LearningRate 0.0380 Epoch: 7 Global Step: 95210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:56:51,997-Speed 3258.82 samples/sec Loss 5.3919 LearningRate 0.0380 Epoch: 7 Global Step: 95220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:56:55,184-Speed 3213.74 samples/sec Loss 5.3201 LearningRate 0.0380 Epoch: 7 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:56:58,265-Speed 3324.76 samples/sec Loss 5.3687 LearningRate 0.0380 Epoch: 7 Global Step: 95240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:57:01,385-Speed 3282.46 samples/sec Loss 5.3497 LearningRate 0.0380 Epoch: 7 Global Step: 95250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:57:04,495-Speed 3294.20 samples/sec Loss 5.3794 LearningRate 0.0380 Epoch: 7 Global Step: 95260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:57:07,593-Speed 3306.72 samples/sec Loss 5.2751 LearningRate 0.0380 Epoch: 7 Global Step: 95270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:57:10,770-Speed 3223.92 samples/sec Loss 5.3510 LearningRate 0.0380 Epoch: 7 Global Step: 95280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:57:13,875-Speed 3298.75 samples/sec Loss 5.3399 LearningRate 0.0380 Epoch: 7 Global Step: 95290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:57:17,010-Speed 3267.30 samples/sec Loss 5.2557 LearningRate 0.0380 Epoch: 7 Global Step: 95300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:57:20,127-Speed 3286.92 samples/sec Loss 5.2465 LearningRate 0.0380 Epoch: 7 Global Step: 95310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:57:23,217-Speed 3314.24 samples/sec Loss 5.3148 LearningRate 0.0380 Epoch: 7 Global Step: 95320 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:57:26,324-Speed 3297.10 samples/sec Loss 5.4135 LearningRate 0.0380 Epoch: 7 Global Step: 95330 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:57:29,406-Speed 3323.97 samples/sec Loss 5.2413 LearningRate 0.0380 Epoch: 7 Global Step: 95340 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:57:32,493-Speed 3318.01 samples/sec Loss 5.2688 LearningRate 0.0380 Epoch: 7 Global Step: 95350 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:57:35,547-Speed 3353.56 samples/sec Loss 5.3680 LearningRate 0.0380 Epoch: 7 Global Step: 95360 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:57:38,619-Speed 3334.51 samples/sec Loss 5.3135 LearningRate 0.0380 Epoch: 7 Global Step: 95370 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:57:41,708-Speed 3316.41 samples/sec Loss 5.2978 LearningRate 0.0380 Epoch: 7 Global Step: 95380 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:57:44,760-Speed 3355.64 samples/sec Loss 5.3874 LearningRate 0.0379 Epoch: 7 Global Step: 95390 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:57:47,834-Speed 3333.08 samples/sec Loss 5.3651 LearningRate 0.0379 Epoch: 7 Global Step: 95400 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:57:50,904-Speed 3336.49 samples/sec Loss 5.3923 LearningRate 0.0379 Epoch: 7 Global Step: 95410 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 09:57:54,010-Speed 3297.85 samples/sec Loss 5.3960 LearningRate 0.0379 Epoch: 7 Global Step: 95420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:57:57,118-Speed 3295.34 samples/sec Loss 5.2405 LearningRate 0.0379 Epoch: 7 Global Step: 95430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:00,199-Speed 3324.85 samples/sec Loss 5.3222 LearningRate 0.0379 Epoch: 7 Global Step: 95440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:03,275-Speed 3330.59 samples/sec Loss 5.3131 LearningRate 0.0379 Epoch: 7 Global Step: 95450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:06,334-Speed 3348.49 samples/sec Loss 5.3187 LearningRate 0.0379 Epoch: 7 Global Step: 95460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:09,396-Speed 3345.50 samples/sec Loss 5.4582 LearningRate 0.0379 Epoch: 7 Global Step: 95470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:12,511-Speed 3287.95 samples/sec Loss 5.3694 LearningRate 0.0379 Epoch: 7 Global Step: 95480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:15,695-Speed 3217.30 samples/sec Loss 5.4266 LearningRate 0.0379 Epoch: 7 Global Step: 95490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:18,788-Speed 3312.22 samples/sec Loss 5.3552 LearningRate 0.0379 Epoch: 7 Global Step: 95500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:21,914-Speed 3276.98 samples/sec Loss 5.3844 LearningRate 0.0379 Epoch: 7 Global Step: 95510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:25,042-Speed 3274.47 samples/sec Loss 5.3240 LearningRate 0.0379 Epoch: 7 Global Step: 95520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:58:28,111-Speed 3337.88 samples/sec Loss 5.3545 LearningRate 0.0379 Epoch: 7 Global Step: 95530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:58:31,181-Speed 3336.34 samples/sec Loss 5.2958 LearningRate 0.0379 Epoch: 7 Global Step: 95540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:58:34,264-Speed 3323.31 samples/sec Loss 5.3023 LearningRate 0.0379 Epoch: 7 Global Step: 95550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:37,420-Speed 3245.80 samples/sec Loss 5.3587 LearningRate 0.0379 Epoch: 7 Global Step: 95560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:40,505-Speed 3319.57 samples/sec Loss 5.3332 LearningRate 0.0379 Epoch: 7 Global Step: 95570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:43,566-Speed 3345.99 samples/sec Loss 5.3928 LearningRate 0.0379 Epoch: 7 Global Step: 95580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:46,692-Speed 3277.61 samples/sec Loss 5.3175 LearningRate 0.0378 Epoch: 7 Global Step: 95590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:49,862-Speed 3230.94 samples/sec Loss 5.3793 LearningRate 0.0378 Epoch: 7 Global Step: 95600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:52,947-Speed 3320.21 samples/sec Loss 5.2242 LearningRate 0.0378 Epoch: 7 Global Step: 95610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:56,057-Speed 3293.72 samples/sec Loss 5.3091 LearningRate 0.0378 Epoch: 7 Global Step: 95620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:58:59,107-Speed 3358.08 samples/sec Loss 5.3486 LearningRate 0.0378 Epoch: 7 Global Step: 95630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:02,191-Speed 3322.24 samples/sec Loss 5.3702 LearningRate 0.0378 Epoch: 7 Global Step: 95640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:05,312-Speed 3281.54 samples/sec Loss 5.2874 LearningRate 0.0378 Epoch: 7 Global Step: 95650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:59:08,369-Speed 3351.04 samples/sec Loss 5.3016 LearningRate 0.0378 Epoch: 7 Global Step: 95660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:59:11,466-Speed 3307.64 samples/sec Loss 5.3374 LearningRate 0.0378 Epoch: 7 Global Step: 95670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:14,559-Speed 3312.37 samples/sec Loss 5.3354 LearningRate 0.0378 Epoch: 7 Global Step: 95680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:17,699-Speed 3262.02 samples/sec Loss 5.3020 LearningRate 0.0378 Epoch: 7 Global Step: 95690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:20,836-Speed 3264.80 samples/sec Loss 5.2512 LearningRate 0.0378 Epoch: 7 Global Step: 95700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:23,901-Speed 3342.12 samples/sec Loss 5.2960 LearningRate 0.0378 Epoch: 7 Global Step: 95710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:26,985-Speed 3321.93 samples/sec Loss 5.2201 LearningRate 0.0378 Epoch: 7 Global Step: 95720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:30,122-Speed 3264.96 samples/sec Loss 5.3241 LearningRate 0.0378 Epoch: 7 Global Step: 95730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:33,191-Speed 3337.44 samples/sec Loss 5.3699 LearningRate 0.0378 Epoch: 7 Global Step: 95740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:36,350-Speed 3243.10 samples/sec Loss 5.3558 LearningRate 0.0378 Epoch: 7 Global Step: 95750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:39,421-Speed 3334.91 samples/sec Loss 5.2926 LearningRate 0.0378 Epoch: 7 Global Step: 95760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 09:59:42,532-Speed 3292.89 samples/sec Loss 5.2937 LearningRate 0.0378 Epoch: 7 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:59:45,586-Speed 3354.11 samples/sec Loss 5.2769 LearningRate 0.0378 Epoch: 7 Global Step: 95780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:59:48,674-Speed 3316.94 samples/sec Loss 5.3233 LearningRate 0.0377 Epoch: 7 Global Step: 95790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:59:51,756-Speed 3323.86 samples/sec Loss 5.2898 LearningRate 0.0377 Epoch: 7 Global Step: 95800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:59:54,970-Speed 3186.93 samples/sec Loss 5.3175 LearningRate 0.0377 Epoch: 7 Global Step: 95810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 09:59:58,054-Speed 3320.73 samples/sec Loss 5.3674 LearningRate 0.0377 Epoch: 7 Global Step: 95820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:00:01,138-Speed 3321.44 samples/sec Loss 5.3349 LearningRate 0.0377 Epoch: 7 Global Step: 95830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:00:04,238-Speed 3305.23 samples/sec Loss 5.2982 LearningRate 0.0377 Epoch: 7 Global Step: 95840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:00:07,309-Speed 3334.96 samples/sec Loss 5.1819 LearningRate 0.0377 Epoch: 7 Global Step: 95850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:10,369-Speed 3346.90 samples/sec Loss 5.3536 LearningRate 0.0377 Epoch: 7 Global Step: 95860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:13,497-Speed 3275.61 samples/sec Loss 5.2641 LearningRate 0.0377 Epoch: 7 Global Step: 95870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:16,637-Speed 3262.52 samples/sec Loss 5.2610 LearningRate 0.0377 Epoch: 7 Global Step: 95880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:19,687-Speed 3357.26 samples/sec Loss 5.3079 LearningRate 0.0377 Epoch: 7 Global Step: 95890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:22,748-Speed 3346.77 samples/sec Loss 5.3294 LearningRate 0.0377 Epoch: 7 Global Step: 95900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:25,828-Speed 3326.55 samples/sec Loss 5.3002 LearningRate 0.0377 Epoch: 7 Global Step: 95910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:28,967-Speed 3262.14 samples/sec Loss 5.3095 LearningRate 0.0377 Epoch: 7 Global Step: 95920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:32,092-Speed 3278.09 samples/sec Loss 5.3713 LearningRate 0.0377 Epoch: 7 Global Step: 95930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:35,207-Speed 3288.24 samples/sec Loss 5.3053 LearningRate 0.0377 Epoch: 7 Global Step: 95940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:38,365-Speed 3243.90 samples/sec Loss 5.3968 LearningRate 0.0377 Epoch: 7 Global Step: 95950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:41,609-Speed 3157.81 samples/sec Loss 5.4066 LearningRate 0.0377 Epoch: 7 Global Step: 95960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:44,752-Speed 3258.82 samples/sec Loss 5.3311 LearningRate 0.0377 Epoch: 7 Global Step: 95970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:47,885-Speed 3269.59 samples/sec Loss 5.3286 LearningRate 0.0377 Epoch: 7 Global Step: 95980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:50,948-Speed 3344.96 samples/sec Loss 5.2159 LearningRate 0.0377 Epoch: 7 Global Step: 95990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:54,083-Speed 3267.09 samples/sec Loss 5.3980 LearningRate 0.0376 Epoch: 7 Global Step: 96000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:00:57,145-Speed 3345.72 samples/sec Loss 5.3143 LearningRate 0.0376 Epoch: 7 Global Step: 96010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:01:00,279-Speed 3268.04 samples/sec Loss 5.3194 LearningRate 0.0376 Epoch: 7 Global Step: 96020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:01:03,467-Speed 3213.87 samples/sec Loss 5.3420 LearningRate 0.0376 Epoch: 7 Global Step: 96030 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:06,641-Speed 3226.87 samples/sec Loss 5.3596 LearningRate 0.0376 Epoch: 7 Global Step: 96040 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:09,745-Speed 3299.51 samples/sec Loss 5.3545 LearningRate 0.0376 Epoch: 7 Global Step: 96050 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:12,857-Speed 3291.71 samples/sec Loss 5.3191 LearningRate 0.0376 Epoch: 7 Global Step: 96060 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:15,993-Speed 3266.60 samples/sec Loss 5.3159 LearningRate 0.0376 Epoch: 7 Global Step: 96070 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:19,166-Speed 3227.78 samples/sec Loss 5.3192 LearningRate 0.0376 Epoch: 7 Global Step: 96080 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:22,282-Speed 3286.90 samples/sec Loss 5.3227 LearningRate 0.0376 Epoch: 7 Global Step: 96090 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:25,380-Speed 3306.52 samples/sec Loss 5.3322 LearningRate 0.0376 Epoch: 7 Global Step: 96100 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:28,513-Speed 3270.10 samples/sec Loss 5.3009 LearningRate 0.0376 Epoch: 7 Global Step: 96110 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:31,607-Speed 3311.00 samples/sec Loss 5.3882 LearningRate 0.0376 Epoch: 7 Global Step: 96120 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:34,746-Speed 3262.43 samples/sec Loss 5.3015 LearningRate 0.0376 Epoch: 7 Global Step: 96130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:01:37,837-Speed 3314.03 samples/sec Loss 5.1983 LearningRate 0.0376 Epoch: 7 Global Step: 96140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:01:40,903-Speed 3341.39 samples/sec Loss 5.2898 LearningRate 0.0376 Epoch: 7 Global Step: 96150 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:44,028-Speed 3278.03 samples/sec Loss 5.2888 LearningRate 0.0376 Epoch: 7 Global Step: 96160 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:47,082-Speed 3353.57 samples/sec Loss 5.2800 LearningRate 0.0376 Epoch: 7 Global Step: 96170 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:50,176-Speed 3310.50 samples/sec Loss 5.3105 LearningRate 0.0376 Epoch: 7 Global Step: 96180 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:53,293-Speed 3286.44 samples/sec Loss 5.2948 LearningRate 0.0376 Epoch: 7 Global Step: 96190 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:56,393-Speed 3304.03 samples/sec Loss 5.2857 LearningRate 0.0375 Epoch: 7 Global Step: 96200 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:01:59,491-Speed 3306.80 samples/sec Loss 5.3263 LearningRate 0.0375 Epoch: 7 Global Step: 96210 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:02:02,612-Speed 3281.76 samples/sec Loss 5.2934 LearningRate 0.0375 Epoch: 7 Global Step: 96220 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:02:05,752-Speed 3263.09 samples/sec Loss 5.3511 LearningRate 0.0375 Epoch: 7 Global Step: 96230 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:02:08,818-Speed 3340.90 samples/sec Loss 5.3284 LearningRate 0.0375 Epoch: 7 Global Step: 96240 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:02:11,951-Speed 3269.03 samples/sec Loss 5.3310 LearningRate 0.0375 Epoch: 7 Global Step: 96250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:15,063-Speed 3292.05 samples/sec Loss 5.2003 LearningRate 0.0375 Epoch: 7 Global Step: 96260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:18,197-Speed 3267.66 samples/sec Loss 5.2754 LearningRate 0.0375 Epoch: 7 Global Step: 96270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:21,288-Speed 3313.79 samples/sec Loss 5.3986 LearningRate 0.0375 Epoch: 7 Global Step: 96280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:24,409-Speed 3282.05 samples/sec Loss 5.3582 LearningRate 0.0375 Epoch: 7 Global Step: 96290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:27,557-Speed 3253.99 samples/sec Loss 5.3182 LearningRate 0.0375 Epoch: 7 Global Step: 96300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:30,800-Speed 3158.69 samples/sec Loss 5.3076 LearningRate 0.0375 Epoch: 7 Global Step: 96310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:33,893-Speed 3311.68 samples/sec Loss 5.3318 LearningRate 0.0375 Epoch: 7 Global Step: 96320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:36,976-Speed 3322.83 samples/sec Loss 5.2402 LearningRate 0.0375 Epoch: 7 Global Step: 96330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:40,069-Speed 3311.06 samples/sec Loss 5.3459 LearningRate 0.0375 Epoch: 7 Global Step: 96340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:43,155-Speed 3319.84 samples/sec Loss 5.2964 LearningRate 0.0375 Epoch: 7 Global Step: 96350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:02:46,232-Speed 3328.88 samples/sec Loss 5.2569 LearningRate 0.0375 Epoch: 7 Global Step: 96360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:49,341-Speed 3294.15 samples/sec Loss 5.3401 LearningRate 0.0375 Epoch: 7 Global Step: 96370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:52,470-Speed 3273.82 samples/sec Loss 5.2795 LearningRate 0.0375 Epoch: 7 Global Step: 96380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:55,648-Speed 3224.17 samples/sec Loss 5.2901 LearningRate 0.0375 Epoch: 7 Global Step: 96390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:02:58,737-Speed 3315.75 samples/sec Loss 5.3566 LearningRate 0.0374 Epoch: 7 Global Step: 96400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:03:01,831-Speed 3309.92 samples/sec Loss 5.3599 LearningRate 0.0374 Epoch: 7 Global Step: 96410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:03:04,924-Speed 3312.03 samples/sec Loss 5.2703 LearningRate 0.0374 Epoch: 7 Global Step: 96420 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:03:08,026-Speed 3302.22 samples/sec Loss 5.2833 LearningRate 0.0374 Epoch: 7 Global Step: 96430 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:03:11,100-Speed 3331.48 samples/sec Loss 5.2702 LearningRate 0.0374 Epoch: 7 Global Step: 96440 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:03:14,229-Speed 3274.94 samples/sec Loss 5.3408 LearningRate 0.0374 Epoch: 7 Global Step: 96450 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:03:17,312-Speed 3322.24 samples/sec Loss 5.3057 LearningRate 0.0374 Epoch: 7 Global Step: 96460 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:03:20,404-Speed 3312.30 samples/sec Loss 5.2329 LearningRate 0.0374 Epoch: 7 Global Step: 96470 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:03:23,491-Speed 3317.90 samples/sec Loss 5.3752 LearningRate 0.0374 Epoch: 7 Global Step: 96480 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:03:26,610-Speed 3285.04 samples/sec Loss 5.2391 LearningRate 0.0374 Epoch: 7 Global Step: 96490 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:03:29,785-Speed 3225.84 samples/sec Loss 5.4037 LearningRate 0.0374 Epoch: 7 Global Step: 96500 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:03:32,852-Speed 3339.11 samples/sec Loss 5.3176 LearningRate 0.0374 Epoch: 7 Global Step: 96510 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:03:35,949-Speed 3307.87 samples/sec Loss 5.2986 LearningRate 0.0374 Epoch: 7 Global Step: 96520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:03:39,035-Speed 3319.81 samples/sec Loss 5.3841 LearningRate 0.0374 Epoch: 7 Global Step: 96530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:03:42,205-Speed 3230.90 samples/sec Loss 5.4297 LearningRate 0.0374 Epoch: 7 Global Step: 96540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:03:45,299-Speed 3311.23 samples/sec Loss 5.2918 LearningRate 0.0374 Epoch: 7 Global Step: 96550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:03:48,419-Speed 3282.31 samples/sec Loss 5.3189 LearningRate 0.0374 Epoch: 7 Global Step: 96560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:03:51,512-Speed 3311.92 samples/sec Loss 5.3943 LearningRate 0.0374 Epoch: 7 Global Step: 96570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:03:54,597-Speed 3320.90 samples/sec Loss 5.3848 LearningRate 0.0374 Epoch: 7 Global Step: 96580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:03:57,700-Speed 3301.20 samples/sec Loss 5.3159 LearningRate 0.0374 Epoch: 7 Global Step: 96590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:04:00,770-Speed 3335.74 samples/sec Loss 5.4769 LearningRate 0.0373 Epoch: 7 Global Step: 96600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:04:03,889-Speed 3284.47 samples/sec Loss 5.3286 LearningRate 0.0373 Epoch: 7 Global Step: 96610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:04:07,072-Speed 3217.85 samples/sec Loss 5.2501 LearningRate 0.0373 Epoch: 7 Global Step: 96620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:04:10,183-Speed 3292.79 samples/sec Loss 5.3181 LearningRate 0.0373 Epoch: 7 Global Step: 96630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:04:13,302-Speed 3283.75 samples/sec Loss 5.2882 LearningRate 0.0373 Epoch: 7 Global Step: 96640 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:04:16,413-Speed 3292.36 samples/sec Loss 5.3099 LearningRate 0.0373 Epoch: 7 Global Step: 96650 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:04:19,595-Speed 3219.59 samples/sec Loss 5.4560 LearningRate 0.0373 Epoch: 7 Global Step: 96660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:04:22,712-Speed 3286.49 samples/sec Loss 5.2525 LearningRate 0.0373 Epoch: 7 Global Step: 96670 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:04:25,880-Speed 3233.10 samples/sec Loss 5.3971 LearningRate 0.0373 Epoch: 7 Global Step: 96680 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:04:29,055-Speed 3226.40 samples/sec Loss 5.2982 LearningRate 0.0373 Epoch: 7 Global Step: 96690 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:04:32,189-Speed 3268.17 samples/sec Loss 5.2499 LearningRate 0.0373 Epoch: 7 Global Step: 96700 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:04:35,299-Speed 3294.43 samples/sec Loss 5.1688 LearningRate 0.0373 Epoch: 7 Global Step: 96710 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:04:38,383-Speed 3321.26 samples/sec Loss 5.2782 LearningRate 0.0373 Epoch: 7 Global Step: 96720 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:04:41,464-Speed 3324.61 samples/sec Loss 5.3293 LearningRate 0.0373 Epoch: 7 Global Step: 96730 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:04:44,569-Speed 3299.33 samples/sec Loss 5.2877 LearningRate 0.0373 Epoch: 7 Global Step: 96740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:04:47,710-Speed 3260.48 samples/sec Loss 5.2156 LearningRate 0.0373 Epoch: 7 Global Step: 96750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:04:50,805-Speed 3309.84 samples/sec Loss 5.2529 LearningRate 0.0373 Epoch: 7 Global Step: 96760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:04:53,969-Speed 3237.84 samples/sec Loss 5.3719 LearningRate 0.0373 Epoch: 7 Global Step: 96770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:04:57,025-Speed 3351.31 samples/sec Loss 5.3310 LearningRate 0.0373 Epoch: 7 Global Step: 96780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:00,168-Speed 3258.92 samples/sec Loss 5.3022 LearningRate 0.0373 Epoch: 7 Global Step: 96790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:03,283-Speed 3289.15 samples/sec Loss 5.3860 LearningRate 0.0373 Epoch: 7 Global Step: 96800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:06,423-Speed 3261.85 samples/sec Loss 5.4094 LearningRate 0.0372 Epoch: 7 Global Step: 96810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:09,489-Speed 3340.68 samples/sec Loss 5.4975 LearningRate 0.0372 Epoch: 7 Global Step: 96820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:12,712-Speed 3178.14 samples/sec Loss 5.3913 LearningRate 0.0372 Epoch: 7 Global Step: 96830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:15,883-Speed 3230.41 samples/sec Loss 5.2524 LearningRate 0.0372 Epoch: 7 Global Step: 96840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:05:19,077-Speed 3206.84 samples/sec Loss 5.3174 LearningRate 0.0372 Epoch: 7 Global Step: 96850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:05:22,179-Speed 3302.18 samples/sec Loss 5.3309 LearningRate 0.0372 Epoch: 7 Global Step: 96860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:05:25,293-Speed 3289.77 samples/sec Loss 5.3657 LearningRate 0.0372 Epoch: 7 Global Step: 96870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:28,411-Speed 3285.46 samples/sec Loss 5.3872 LearningRate 0.0372 Epoch: 7 Global Step: 96880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:31,497-Speed 3319.64 samples/sec Loss 5.4245 LearningRate 0.0372 Epoch: 7 Global Step: 96890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:34,569-Speed 3333.87 samples/sec Loss 5.1820 LearningRate 0.0372 Epoch: 7 Global Step: 96900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:37,677-Speed 3296.43 samples/sec Loss 5.4569 LearningRate 0.0372 Epoch: 7 Global Step: 96910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:40,886-Speed 3191.55 samples/sec Loss 5.3051 LearningRate 0.0372 Epoch: 7 Global Step: 96920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:44,046-Speed 3241.60 samples/sec Loss 5.2172 LearningRate 0.0372 Epoch: 7 Global Step: 96930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:47,101-Speed 3353.33 samples/sec Loss 5.4627 LearningRate 0.0372 Epoch: 7 Global Step: 96940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:50,249-Speed 3253.59 samples/sec Loss 5.3190 LearningRate 0.0372 Epoch: 7 Global Step: 96950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:53,362-Speed 3290.20 samples/sec Loss 5.3194 LearningRate 0.0372 Epoch: 7 Global Step: 96960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:56,415-Speed 3355.78 samples/sec Loss 5.2569 LearningRate 0.0372 Epoch: 7 Global Step: 96970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:05:59,490-Speed 3331.29 samples/sec Loss 5.3006 LearningRate 0.0372 Epoch: 7 Global Step: 96980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:06:02,577-Speed 3317.74 samples/sec Loss 5.3213 LearningRate 0.0372 Epoch: 7 Global Step: 96990 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:06:05,641-Speed 3342.55 samples/sec Loss 5.4485 LearningRate 0.0372 Epoch: 7 Global Step: 97000 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:06:08,774-Speed 3269.48 samples/sec Loss 5.2492 LearningRate 0.0371 Epoch: 7 Global Step: 97010 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:06:11,856-Speed 3324.96 samples/sec Loss 5.3019 LearningRate 0.0371 Epoch: 7 Global Step: 97020 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:06:14,999-Speed 3258.32 samples/sec Loss 5.3518 LearningRate 0.0371 Epoch: 7 Global Step: 97030 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:06:18,071-Speed 3334.54 samples/sec Loss 5.2722 LearningRate 0.0371 Epoch: 7 Global Step: 97040 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:06:21,132-Speed 3346.70 samples/sec Loss 5.2885 LearningRate 0.0371 Epoch: 7 Global Step: 97050 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:06:24,243-Speed 3292.95 samples/sec Loss 5.2335 LearningRate 0.0371 Epoch: 7 Global Step: 97060 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:06:27,339-Speed 3308.23 samples/sec Loss 5.4191 LearningRate 0.0371 Epoch: 7 Global Step: 97070 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:06:30,414-Speed 3331.40 samples/sec Loss 5.3265 LearningRate 0.0371 Epoch: 7 Global Step: 97080 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:06:33,467-Speed 3355.26 samples/sec Loss 5.3795 LearningRate 0.0371 Epoch: 7 Global Step: 97090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:06:36,572-Speed 3298.58 samples/sec Loss 5.3587 LearningRate 0.0371 Epoch: 7 Global Step: 97100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:06:39,674-Speed 3301.95 samples/sec Loss 5.2981 LearningRate 0.0371 Epoch: 7 Global Step: 97110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:06:42,801-Speed 3276.42 samples/sec Loss 5.3606 LearningRate 0.0371 Epoch: 7 Global Step: 97120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:06:45,911-Speed 3293.38 samples/sec Loss 5.2299 LearningRate 0.0371 Epoch: 7 Global Step: 97130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:06:49,059-Speed 3254.10 samples/sec Loss 5.3730 LearningRate 0.0371 Epoch: 7 Global Step: 97140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:06:52,142-Speed 3322.42 samples/sec Loss 5.2679 LearningRate 0.0371 Epoch: 7 Global Step: 97150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:06:55,193-Speed 3357.58 samples/sec Loss 5.1462 LearningRate 0.0371 Epoch: 7 Global Step: 97160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:06:58,279-Speed 3319.29 samples/sec Loss 5.3320 LearningRate 0.0371 Epoch: 7 Global Step: 97170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:07:01,375-Speed 3308.32 samples/sec Loss 5.2827 LearningRate 0.0371 Epoch: 7 Global Step: 97180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:07:04,447-Speed 3334.14 samples/sec Loss 5.2627 LearningRate 0.0371 Epoch: 7 Global Step: 97190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:07:07,559-Speed 3291.76 samples/sec Loss 5.3462 LearningRate 0.0371 Epoch: 7 Global Step: 97200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:07:10,635-Speed 3329.90 samples/sec Loss 5.3842 LearningRate 0.0370 Epoch: 7 Global Step: 97210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:07:13,787-Speed 3250.08 samples/sec Loss 5.3169 LearningRate 0.0370 Epoch: 7 Global Step: 97220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:07:16,927-Speed 3261.66 samples/sec Loss 5.2983 LearningRate 0.0370 Epoch: 7 Global Step: 97230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:07:20,030-Speed 3301.66 samples/sec Loss 5.3724 LearningRate 0.0370 Epoch: 7 Global Step: 97240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:07:23,162-Speed 3270.12 samples/sec Loss 5.3408 LearningRate 0.0370 Epoch: 7 Global Step: 97250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:07:26,250-Speed 3317.24 samples/sec Loss 5.4061 LearningRate 0.0370 Epoch: 7 Global Step: 97260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:07:29,407-Speed 3244.81 samples/sec Loss 5.3540 LearningRate 0.0370 Epoch: 7 Global Step: 97270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:07:32,506-Speed 3305.45 samples/sec Loss 5.3607 LearningRate 0.0370 Epoch: 7 Global Step: 97280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:07:35,604-Speed 3306.11 samples/sec Loss 5.4448 LearningRate 0.0370 Epoch: 7 Global Step: 97290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:07:38,704-Speed 3304.03 samples/sec Loss 5.2577 LearningRate 0.0370 Epoch: 7 Global Step: 97300 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:07:41,858-Speed 3247.70 samples/sec Loss 5.3200 LearningRate 0.0370 Epoch: 7 Global Step: 97310 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:07:44,936-Speed 3328.24 samples/sec Loss 5.2342 LearningRate 0.0370 Epoch: 7 Global Step: 97320 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:07:48,067-Speed 3271.55 samples/sec Loss 5.2617 LearningRate 0.0370 Epoch: 7 Global Step: 97330 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:07:51,202-Speed 3267.26 samples/sec Loss 5.2057 LearningRate 0.0370 Epoch: 7 Global Step: 97340 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:07:54,290-Speed 3317.84 samples/sec Loss 5.2471 LearningRate 0.0370 Epoch: 7 Global Step: 97350 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:07:57,354-Speed 3342.36 samples/sec Loss 5.3251 LearningRate 0.0370 Epoch: 7 Global Step: 97360 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:08:00,481-Speed 3275.45 samples/sec Loss 5.3219 LearningRate 0.0370 Epoch: 7 Global Step: 97370 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:08:03,621-Speed 3262.28 samples/sec Loss 5.2992 LearningRate 0.0370 Epoch: 7 Global Step: 97380 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:08:06,786-Speed 3236.84 samples/sec Loss 5.4098 LearningRate 0.0370 Epoch: 7 Global Step: 97390 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:08:09,839-Speed 3354.96 samples/sec Loss 5.3209 LearningRate 0.0370 Epoch: 7 Global Step: 97400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:13,032-Speed 3207.76 samples/sec Loss 5.3662 LearningRate 0.0370 Epoch: 7 Global Step: 97410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:16,173-Speed 3260.77 samples/sec Loss 5.3094 LearningRate 0.0369 Epoch: 7 Global Step: 97420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:19,296-Speed 3280.11 samples/sec Loss 5.3190 LearningRate 0.0369 Epoch: 7 Global Step: 97430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:22,357-Speed 3347.01 samples/sec Loss 5.3291 LearningRate 0.0369 Epoch: 7 Global Step: 97440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:25,534-Speed 3224.38 samples/sec Loss 5.3637 LearningRate 0.0369 Epoch: 7 Global Step: 97450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:28,616-Speed 3322.62 samples/sec Loss 5.2683 LearningRate 0.0369 Epoch: 7 Global Step: 97460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:31,707-Speed 3314.20 samples/sec Loss 5.3213 LearningRate 0.0369 Epoch: 7 Global Step: 97470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:34,794-Speed 3318.43 samples/sec Loss 5.3822 LearningRate 0.0369 Epoch: 7 Global Step: 97480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:37,882-Speed 3316.77 samples/sec Loss 5.2931 LearningRate 0.0369 Epoch: 7 Global Step: 97490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:41,006-Speed 3279.05 samples/sec Loss 5.3817 LearningRate 0.0369 Epoch: 7 Global Step: 97500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:08:44,063-Speed 3350.79 samples/sec Loss 5.3955 LearningRate 0.0369 Epoch: 7 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:08:47,131-Speed 3338.54 samples/sec Loss 5.3838 LearningRate 0.0369 Epoch: 7 Global Step: 97520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:50,279-Speed 3254.27 samples/sec Loss 5.3697 LearningRate 0.0369 Epoch: 7 Global Step: 97530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:53,407-Speed 3274.29 samples/sec Loss 5.2682 LearningRate 0.0369 Epoch: 7 Global Step: 97540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:56,556-Speed 3253.11 samples/sec Loss 5.2334 LearningRate 0.0369 Epoch: 7 Global Step: 97550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:08:59,722-Speed 3235.06 samples/sec Loss 5.3541 LearningRate 0.0369 Epoch: 7 Global Step: 97560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:09:02,872-Speed 3251.42 samples/sec Loss 5.3903 LearningRate 0.0369 Epoch: 7 Global Step: 97570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:09:06,040-Speed 3234.14 samples/sec Loss 5.4108 LearningRate 0.0369 Epoch: 7 Global Step: 97580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:09:09,132-Speed 3312.62 samples/sec Loss 5.3145 LearningRate 0.0369 Epoch: 7 Global Step: 97590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:09:12,281-Speed 3252.68 samples/sec Loss 5.2598 LearningRate 0.0369 Epoch: 7 Global Step: 97600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:09:15,369-Speed 3316.95 samples/sec Loss 5.1759 LearningRate 0.0369 Epoch: 7 Global Step: 97610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:09:18,476-Speed 3296.73 samples/sec Loss 5.4189 LearningRate 0.0368 Epoch: 7 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:09:21,539-Speed 3344.55 samples/sec Loss 5.3223 LearningRate 0.0368 Epoch: 7 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:09:24,601-Speed 3344.96 samples/sec Loss 5.3433 LearningRate 0.0368 Epoch: 7 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:09:27,762-Speed 3241.00 samples/sec Loss 5.3081 LearningRate 0.0368 Epoch: 7 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:09:30,922-Speed 3240.54 samples/sec Loss 5.3500 LearningRate 0.0368 Epoch: 7 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:09:33,979-Speed 3350.68 samples/sec Loss 5.3099 LearningRate 0.0368 Epoch: 7 Global Step: 97670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:09:37,161-Speed 3219.72 samples/sec Loss 5.3036 LearningRate 0.0368 Epoch: 7 Global Step: 97680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:09:40,345-Speed 3217.18 samples/sec Loss 5.4106 LearningRate 0.0368 Epoch: 7 Global Step: 97690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:09:43,459-Speed 3288.62 samples/sec Loss 5.2881 LearningRate 0.0368 Epoch: 7 Global Step: 97700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:09:46,559-Speed 3304.67 samples/sec Loss 5.2566 LearningRate 0.0368 Epoch: 7 Global Step: 97710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:09:49,637-Speed 3327.54 samples/sec Loss 5.3582 LearningRate 0.0368 Epoch: 7 Global Step: 97720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:09:52,715-Speed 3328.68 samples/sec Loss 5.3030 LearningRate 0.0368 Epoch: 7 Global Step: 97730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:09:55,781-Speed 3341.31 samples/sec Loss 5.2373 LearningRate 0.0368 Epoch: 7 Global Step: 97740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:09:58,881-Speed 3304.34 samples/sec Loss 5.4239 LearningRate 0.0368 Epoch: 7 Global Step: 97750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:10:02,010-Speed 3273.60 samples/sec Loss 5.2807 LearningRate 0.0368 Epoch: 7 Global Step: 97760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:10:05,114-Speed 3299.68 samples/sec Loss 5.1740 LearningRate 0.0368 Epoch: 7 Global Step: 97770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:10:08,214-Speed 3305.23 samples/sec Loss 5.2758 LearningRate 0.0368 Epoch: 7 Global Step: 97780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:10:11,335-Speed 3281.18 samples/sec Loss 5.3539 LearningRate 0.0368 Epoch: 7 Global Step: 97790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:10:14,414-Speed 3326.45 samples/sec Loss 5.2208 LearningRate 0.0368 Epoch: 7 Global Step: 97800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:10:17,540-Speed 3277.74 samples/sec Loss 5.3331 LearningRate 0.0368 Epoch: 7 Global Step: 97810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:10:20,634-Speed 3310.61 samples/sec Loss 5.2861 LearningRate 0.0368 Epoch: 7 Global Step: 97820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:10:23,713-Speed 3326.56 samples/sec Loss 5.3295 LearningRate 0.0367 Epoch: 7 Global Step: 97830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:10:26,875-Speed 3240.30 samples/sec Loss 5.2509 LearningRate 0.0367 Epoch: 7 Global Step: 97840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:10:29,967-Speed 3312.47 samples/sec Loss 5.3304 LearningRate 0.0367 Epoch: 7 Global Step: 97850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:10:33,045-Speed 3326.93 samples/sec Loss 5.2512 LearningRate 0.0367 Epoch: 7 Global Step: 97860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:10:36,151-Speed 3298.20 samples/sec Loss 5.2899 LearningRate 0.0367 Epoch: 7 Global Step: 97870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:10:39,268-Speed 3286.91 samples/sec Loss 5.2612 LearningRate 0.0367 Epoch: 7 Global Step: 97880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:10:42,393-Speed 3276.80 samples/sec Loss 5.3599 LearningRate 0.0367 Epoch: 7 Global Step: 97890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:10:45,500-Speed 3297.13 samples/sec Loss 5.2396 LearningRate 0.0367 Epoch: 7 Global Step: 97900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:10:48,618-Speed 3285.75 samples/sec Loss 5.4024 LearningRate 0.0367 Epoch: 7 Global Step: 97910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:10:51,800-Speed 3218.53 samples/sec Loss 5.1433 LearningRate 0.0367 Epoch: 7 Global Step: 97920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:10:54,918-Speed 3284.93 samples/sec Loss 5.3386 LearningRate 0.0367 Epoch: 7 Global Step: 97930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 10:10:57,974-Speed 3352.85 samples/sec Loss 5.3487 LearningRate 0.0367 Epoch: 7 Global Step: 97940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:11:01,078-Speed 3299.27 samples/sec Loss 5.2763 LearningRate 0.0367 Epoch: 7 Global Step: 97950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:04,152-Speed 3332.35 samples/sec Loss 5.3030 LearningRate 0.0367 Epoch: 7 Global Step: 97960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:07,261-Speed 3295.17 samples/sec Loss 5.2652 LearningRate 0.0367 Epoch: 7 Global Step: 97970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:10,329-Speed 3338.57 samples/sec Loss 5.3503 LearningRate 0.0367 Epoch: 7 Global Step: 97980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:13,434-Speed 3298.88 samples/sec Loss 5.4023 LearningRate 0.0367 Epoch: 7 Global Step: 97990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:16,519-Speed 3320.26 samples/sec Loss 5.3475 LearningRate 0.0367 Epoch: 7 Global Step: 98000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:19,718-Speed 3201.79 samples/sec Loss 5.4027 LearningRate 0.0367 Epoch: 7 Global Step: 98010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:22,805-Speed 3318.51 samples/sec Loss 5.4013 LearningRate 0.0367 Epoch: 7 Global Step: 98020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:25,917-Speed 3291.75 samples/sec Loss 5.3231 LearningRate 0.0366 Epoch: 7 Global Step: 98030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:29,083-Speed 3234.85 samples/sec Loss 5.3877 LearningRate 0.0366 Epoch: 7 Global Step: 98040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:32,176-Speed 3311.77 samples/sec Loss 5.3070 LearningRate 0.0366 Epoch: 7 Global Step: 98050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:11:35,286-Speed 3294.08 samples/sec Loss 5.2952 LearningRate 0.0366 Epoch: 7 Global Step: 98060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:38,436-Speed 3251.64 samples/sec Loss 5.2545 LearningRate 0.0366 Epoch: 7 Global Step: 98070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:41,576-Speed 3261.64 samples/sec Loss 5.3349 LearningRate 0.0366 Epoch: 7 Global Step: 98080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:44,723-Speed 3254.78 samples/sec Loss 5.3602 LearningRate 0.0366 Epoch: 7 Global Step: 98090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:47,844-Speed 3282.18 samples/sec Loss 5.3553 LearningRate 0.0366 Epoch: 7 Global Step: 98100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:50,944-Speed 3304.68 samples/sec Loss 5.4187 LearningRate 0.0366 Epoch: 7 Global Step: 98110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:54,101-Speed 3243.81 samples/sec Loss 5.3494 LearningRate 0.0366 Epoch: 7 Global Step: 98120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:11:57,200-Speed 3306.38 samples/sec Loss 5.2388 LearningRate 0.0366 Epoch: 7 Global Step: 98130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:00,296-Speed 3308.00 samples/sec Loss 5.2795 LearningRate 0.0366 Epoch: 7 Global Step: 98140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:03,373-Speed 3329.22 samples/sec Loss 5.2887 LearningRate 0.0366 Epoch: 7 Global Step: 98150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:06,493-Speed 3283.36 samples/sec Loss 5.2669 LearningRate 0.0366 Epoch: 7 Global Step: 98160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:09,568-Speed 3330.93 samples/sec Loss 5.3079 LearningRate 0.0366 Epoch: 7 Global Step: 98170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:12,642-Speed 3332.20 samples/sec Loss 5.3579 LearningRate 0.0366 Epoch: 7 Global Step: 98180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:15,754-Speed 3291.29 samples/sec Loss 5.3453 LearningRate 0.0366 Epoch: 7 Global Step: 98190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:18,871-Speed 3286.07 samples/sec Loss 5.3292 LearningRate 0.0366 Epoch: 7 Global Step: 98200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:21,952-Speed 3324.81 samples/sec Loss 5.2834 LearningRate 0.0366 Epoch: 7 Global Step: 98210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:25,050-Speed 3306.50 samples/sec Loss 5.3336 LearningRate 0.0366 Epoch: 7 Global Step: 98220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:28,194-Speed 3257.92 samples/sec Loss 5.2924 LearningRate 0.0366 Epoch: 7 Global Step: 98230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:31,298-Speed 3299.13 samples/sec Loss 5.3563 LearningRate 0.0365 Epoch: 7 Global Step: 98240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:34,386-Speed 3317.41 samples/sec Loss 5.3787 LearningRate 0.0365 Epoch: 7 Global Step: 98250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:12:37,478-Speed 3313.48 samples/sec Loss 5.2331 LearningRate 0.0365 Epoch: 7 Global Step: 98260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:12:40,577-Speed 3304.48 samples/sec Loss 5.3427 LearningRate 0.0365 Epoch: 7 Global Step: 98270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:12:43,752-Speed 3226.40 samples/sec Loss 5.2757 LearningRate 0.0365 Epoch: 7 Global Step: 98280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:12:46,833-Speed 3325.39 samples/sec Loss 5.2819 LearningRate 0.0365 Epoch: 7 Global Step: 98290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:12:49,914-Speed 3323.51 samples/sec Loss 5.3339 LearningRate 0.0365 Epoch: 7 Global Step: 98300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:12:53,079-Speed 3237.45 samples/sec Loss 5.3719 LearningRate 0.0365 Epoch: 7 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:12:56,184-Speed 3298.96 samples/sec Loss 5.3367 LearningRate 0.0365 Epoch: 7 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:12:59,254-Speed 3336.11 samples/sec Loss 5.3732 LearningRate 0.0365 Epoch: 7 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:13:02,372-Speed 3285.48 samples/sec Loss 5.3313 LearningRate 0.0365 Epoch: 7 Global Step: 98340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:13:05,454-Speed 3323.25 samples/sec Loss 5.2821 LearningRate 0.0365 Epoch: 7 Global Step: 98350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:08,563-Speed 3294.11 samples/sec Loss 5.3477 LearningRate 0.0365 Epoch: 7 Global Step: 98360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:11,665-Speed 3302.49 samples/sec Loss 5.2297 LearningRate 0.0365 Epoch: 7 Global Step: 98370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:14,778-Speed 3290.92 samples/sec Loss 5.3586 LearningRate 0.0365 Epoch: 7 Global Step: 98380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:17,957-Speed 3221.94 samples/sec Loss 5.3150 LearningRate 0.0365 Epoch: 7 Global Step: 98390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:21,021-Speed 3342.57 samples/sec Loss 5.1963 LearningRate 0.0365 Epoch: 7 Global Step: 98400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:24,119-Speed 3307.13 samples/sec Loss 5.2796 LearningRate 0.0365 Epoch: 7 Global Step: 98410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:27,275-Speed 3245.53 samples/sec Loss 5.3834 LearningRate 0.0365 Epoch: 7 Global Step: 98420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:30,409-Speed 3268.04 samples/sec Loss 5.3290 LearningRate 0.0365 Epoch: 7 Global Step: 98430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:33,502-Speed 3311.27 samples/sec Loss 5.1646 LearningRate 0.0364 Epoch: 7 Global Step: 98440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:36,665-Speed 3238.38 samples/sec Loss 5.3354 LearningRate 0.0364 Epoch: 7 Global Step: 98450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:13:39,766-Speed 3303.60 samples/sec Loss 5.2921 LearningRate 0.0364 Epoch: 7 Global Step: 98460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:42,899-Speed 3269.55 samples/sec Loss 5.2405 LearningRate 0.0364 Epoch: 7 Global Step: 98470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:45,982-Speed 3321.77 samples/sec Loss 5.3388 LearningRate 0.0364 Epoch: 7 Global Step: 98480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:49,098-Speed 3287.57 samples/sec Loss 5.3090 LearningRate 0.0364 Epoch: 7 Global Step: 98490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:52,287-Speed 3211.69 samples/sec Loss 5.2934 LearningRate 0.0364 Epoch: 7 Global Step: 98500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:55,369-Speed 3324.41 samples/sec Loss 5.2534 LearningRate 0.0364 Epoch: 7 Global Step: 98510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:13:58,465-Speed 3307.35 samples/sec Loss 5.2529 LearningRate 0.0364 Epoch: 7 Global Step: 98520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:14:01,537-Speed 3335.01 samples/sec Loss 5.3135 LearningRate 0.0364 Epoch: 7 Global Step: 98530 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:14:04,643-Speed 3297.78 samples/sec Loss 5.2913 LearningRate 0.0364 Epoch: 7 Global Step: 98540 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:14:07,751-Speed 3296.24 samples/sec Loss 5.3350 LearningRate 0.0364 Epoch: 7 Global Step: 98550 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:14:10,834-Speed 3322.27 samples/sec Loss 5.2213 LearningRate 0.0364 Epoch: 7 Global Step: 98560 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:14:13,919-Speed 3320.51 samples/sec Loss 5.3261 LearningRate 0.0364 Epoch: 7 Global Step: 98570 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:14:17,017-Speed 3305.97 samples/sec Loss 5.2470 LearningRate 0.0364 Epoch: 7 Global Step: 98580 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:14:20,116-Speed 3305.77 samples/sec Loss 5.3164 LearningRate 0.0364 Epoch: 7 Global Step: 98590 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:14:23,210-Speed 3310.44 samples/sec Loss 5.2377 LearningRate 0.0364 Epoch: 7 Global Step: 98600 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:14:26,305-Speed 3309.76 samples/sec Loss 5.3436 LearningRate 0.0364 Epoch: 7 Global Step: 98610 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:14:29,481-Speed 3224.94 samples/sec Loss 5.3672 LearningRate 0.0364 Epoch: 7 Global Step: 98620 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:14:32,593-Speed 3291.26 samples/sec Loss 5.2991 LearningRate 0.0364 Epoch: 7 Global Step: 98630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:14:35,663-Speed 3336.69 samples/sec Loss 5.3696 LearningRate 0.0364 Epoch: 7 Global Step: 98640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:14:38,761-Speed 3306.47 samples/sec Loss 5.3610 LearningRate 0.0363 Epoch: 7 Global Step: 98650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:14:41,910-Speed 3253.05 samples/sec Loss 5.2983 LearningRate 0.0363 Epoch: 7 Global Step: 98660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:14:45,004-Speed 3310.16 samples/sec Loss 5.3563 LearningRate 0.0363 Epoch: 7 Global Step: 98670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:14:48,088-Speed 3320.99 samples/sec Loss 5.2679 LearningRate 0.0363 Epoch: 7 Global Step: 98680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:14:51,216-Speed 3275.53 samples/sec Loss 5.2920 LearningRate 0.0363 Epoch: 7 Global Step: 98690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:14:54,390-Speed 3226.71 samples/sec Loss 5.4095 LearningRate 0.0363 Epoch: 7 Global Step: 98700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:14:57,496-Speed 3298.44 samples/sec Loss 5.3302 LearningRate 0.0363 Epoch: 7 Global Step: 98710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:00,634-Speed 3264.19 samples/sec Loss 5.3392 LearningRate 0.0363 Epoch: 7 Global Step: 98720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:03,763-Speed 3273.67 samples/sec Loss 5.3393 LearningRate 0.0363 Epoch: 7 Global Step: 98730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:15:06,909-Speed 3255.38 samples/sec Loss 5.2968 LearningRate 0.0363 Epoch: 7 Global Step: 98740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:09,983-Speed 3332.12 samples/sec Loss 5.2987 LearningRate 0.0363 Epoch: 7 Global Step: 98750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:13,123-Speed 3262.71 samples/sec Loss 5.4006 LearningRate 0.0363 Epoch: 7 Global Step: 98760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:16,251-Speed 3274.84 samples/sec Loss 5.3429 LearningRate 0.0363 Epoch: 7 Global Step: 98770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:19,348-Speed 3306.90 samples/sec Loss 5.2408 LearningRate 0.0363 Epoch: 7 Global Step: 98780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:22,501-Speed 3248.33 samples/sec Loss 5.1378 LearningRate 0.0363 Epoch: 7 Global Step: 98790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:25,635-Speed 3268.55 samples/sec Loss 5.3084 LearningRate 0.0363 Epoch: 7 Global Step: 98800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:28,763-Speed 3276.46 samples/sec Loss 5.2349 LearningRate 0.0363 Epoch: 7 Global Step: 98810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:31,908-Speed 3257.10 samples/sec Loss 5.3220 LearningRate 0.0363 Epoch: 7 Global Step: 98820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:35,027-Speed 3283.14 samples/sec Loss 5.2680 LearningRate 0.0363 Epoch: 7 Global Step: 98830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:15:38,169-Speed 3260.45 samples/sec Loss 5.3436 LearningRate 0.0363 Epoch: 7 Global Step: 98840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:15:41,277-Speed 3296.36 samples/sec Loss 5.3409 LearningRate 0.0363 Epoch: 7 Global Step: 98850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:15:44,393-Speed 3287.09 samples/sec Loss 5.4131 LearningRate 0.0362 Epoch: 7 Global Step: 98860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:15:47,526-Speed 3269.45 samples/sec Loss 5.2763 LearningRate 0.0362 Epoch: 7 Global Step: 98870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:15:50,653-Speed 3274.76 samples/sec Loss 5.3010 LearningRate 0.0362 Epoch: 7 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:15:53,768-Speed 3288.94 samples/sec Loss 5.3136 LearningRate 0.0362 Epoch: 7 Global Step: 98890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:15:56,871-Speed 3300.39 samples/sec Loss 5.3794 LearningRate 0.0362 Epoch: 7 Global Step: 98900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:15:59,972-Speed 3303.34 samples/sec Loss 5.3712 LearningRate 0.0362 Epoch: 7 Global Step: 98910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:16:03,112-Speed 3262.01 samples/sec Loss 5.3642 LearningRate 0.0362 Epoch: 7 Global Step: 98920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:16:06,242-Speed 3272.69 samples/sec Loss 5.3784 LearningRate 0.0362 Epoch: 7 Global Step: 98930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:09,308-Speed 3341.16 samples/sec Loss 5.3255 LearningRate 0.0362 Epoch: 7 Global Step: 98940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:12,407-Speed 3305.87 samples/sec Loss 5.3017 LearningRate 0.0362 Epoch: 7 Global Step: 98950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:15,609-Speed 3198.21 samples/sec Loss 5.2989 LearningRate 0.0362 Epoch: 7 Global Step: 98960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:18,761-Speed 3249.65 samples/sec Loss 5.2659 LearningRate 0.0362 Epoch: 7 Global Step: 98970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:21,890-Speed 3273.98 samples/sec Loss 5.3141 LearningRate 0.0362 Epoch: 7 Global Step: 98980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:24,993-Speed 3301.72 samples/sec Loss 5.3104 LearningRate 0.0362 Epoch: 7 Global Step: 98990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:28,095-Speed 3301.13 samples/sec Loss 5.3123 LearningRate 0.0362 Epoch: 7 Global Step: 99000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:31,275-Speed 3221.75 samples/sec Loss 5.3152 LearningRate 0.0362 Epoch: 7 Global Step: 99010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:34,378-Speed 3301.33 samples/sec Loss 5.1789 LearningRate 0.0362 Epoch: 7 Global Step: 99020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:37,567-Speed 3212.24 samples/sec Loss 5.3635 LearningRate 0.0362 Epoch: 7 Global Step: 99030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:16:40,693-Speed 3277.03 samples/sec Loss 5.3338 LearningRate 0.0362 Epoch: 7 Global Step: 99040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:16:43,844-Speed 3250.20 samples/sec Loss 5.3363 LearningRate 0.0362 Epoch: 7 Global Step: 99050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:16:46,918-Speed 3332.86 samples/sec Loss 5.2045 LearningRate 0.0361 Epoch: 7 Global Step: 99060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:16:50,053-Speed 3266.58 samples/sec Loss 5.3575 LearningRate 0.0361 Epoch: 7 Global Step: 99070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:53,159-Speed 3299.41 samples/sec Loss 5.2544 LearningRate 0.0361 Epoch: 7 Global Step: 99080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:16:56,199-Speed 3368.47 samples/sec Loss 5.3941 LearningRate 0.0361 Epoch: 7 Global Step: 99090 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:16:59,321-Speed 3281.45 samples/sec Loss 5.3243 LearningRate 0.0361 Epoch: 7 Global Step: 99100 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:17:02,526-Speed 3196.28 samples/sec Loss 5.3282 LearningRate 0.0361 Epoch: 7 Global Step: 99110 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:17:05,650-Speed 3278.15 samples/sec Loss 5.3278 LearningRate 0.0361 Epoch: 7 Global Step: 99120 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:17:08,705-Speed 3353.46 samples/sec Loss 5.3814 LearningRate 0.0361 Epoch: 7 Global Step: 99130 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:17:11,812-Speed 3297.14 samples/sec Loss 5.2240 LearningRate 0.0361 Epoch: 7 Global Step: 99140 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:17:14,883-Speed 3335.34 samples/sec Loss 5.3870 LearningRate 0.0361 Epoch: 7 Global Step: 99150 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:17:18,007-Speed 3278.75 samples/sec Loss 5.2867 LearningRate 0.0361 Epoch: 7 Global Step: 99160 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:17:21,071-Speed 3343.32 samples/sec Loss 5.2553 LearningRate 0.0361 Epoch: 7 Global Step: 99170 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:17:24,124-Speed 3355.49 samples/sec Loss 5.3868 LearningRate 0.0361 Epoch: 7 Global Step: 99180 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:17:27,197-Speed 3333.09 samples/sec Loss 5.3271 LearningRate 0.0361 Epoch: 7 Global Step: 99190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:17:30,307-Speed 3294.41 samples/sec Loss 5.0953 LearningRate 0.0361 Epoch: 7 Global Step: 99200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:17:33,396-Speed 3315.37 samples/sec Loss 5.2916 LearningRate 0.0361 Epoch: 7 Global Step: 99210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:17:36,541-Speed 3257.18 samples/sec Loss 5.2514 LearningRate 0.0361 Epoch: 7 Global Step: 99220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:17:39,647-Speed 3297.63 samples/sec Loss 5.3067 LearningRate 0.0361 Epoch: 7 Global Step: 99230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:17:42,853-Speed 3195.10 samples/sec Loss 5.2032 LearningRate 0.0361 Epoch: 7 Global Step: 99240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:17:45,921-Speed 3338.22 samples/sec Loss 5.4143 LearningRate 0.0361 Epoch: 7 Global Step: 99250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:17:49,050-Speed 3273.65 samples/sec Loss 5.2581 LearningRate 0.0361 Epoch: 7 Global Step: 99260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:17:52,197-Speed 3254.70 samples/sec Loss 5.3013 LearningRate 0.0360 Epoch: 7 Global Step: 99270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:17:55,387-Speed 3211.70 samples/sec Loss 5.3604 LearningRate 0.0360 Epoch: 7 Global Step: 99280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:17:58,475-Speed 3316.80 samples/sec Loss 5.2102 LearningRate 0.0360 Epoch: 7 Global Step: 99290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:18:01,567-Speed 3313.57 samples/sec Loss 5.2439 LearningRate 0.0360 Epoch: 7 Global Step: 99300 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:18:04,720-Speed 3248.60 samples/sec Loss 5.2453 LearningRate 0.0360 Epoch: 7 Global Step: 99310 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:18:07,797-Speed 3327.79 samples/sec Loss 5.2922 LearningRate 0.0360 Epoch: 7 Global Step: 99320 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:18:10,856-Speed 3348.97 samples/sec Loss 5.3457 LearningRate 0.0360 Epoch: 7 Global Step: 99330 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:18:13,989-Speed 3269.71 samples/sec Loss 5.2908 LearningRate 0.0360 Epoch: 7 Global Step: 99340 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:18:17,074-Speed 3320.49 samples/sec Loss 5.2586 LearningRate 0.0360 Epoch: 7 Global Step: 99350 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:18:20,349-Speed 3126.90 samples/sec Loss 5.2860 LearningRate 0.0360 Epoch: 7 Global Step: 99360 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:18:52,330-Speed 320.21 samples/sec Loss 5.0248 LearningRate 0.0360 Epoch: 8 Global Step: 99370 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:18:55,810-Speed 2944.42 samples/sec Loss 3.8714 LearningRate 0.0360 Epoch: 8 Global Step: 99380 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:18:58,996-Speed 3215.02 samples/sec Loss 3.9765 LearningRate 0.0360 Epoch: 8 Global Step: 99390 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:19:02,104-Speed 3295.80 samples/sec Loss 3.8848 LearningRate 0.0360 Epoch: 8 Global Step: 99400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:05,246-Speed 3259.87 samples/sec Loss 3.9439 LearningRate 0.0360 Epoch: 8 Global Step: 99410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:08,320-Speed 3332.08 samples/sec Loss 3.9092 LearningRate 0.0360 Epoch: 8 Global Step: 99420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:11,475-Speed 3247.14 samples/sec Loss 3.9495 LearningRate 0.0360 Epoch: 8 Global Step: 99430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:14,625-Speed 3251.56 samples/sec Loss 3.8960 LearningRate 0.0360 Epoch: 8 Global Step: 99440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:17,711-Speed 3319.54 samples/sec Loss 3.9471 LearningRate 0.0360 Epoch: 8 Global Step: 99450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:20,761-Speed 3358.61 samples/sec Loss 4.0967 LearningRate 0.0360 Epoch: 8 Global Step: 99460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:23,882-Speed 3281.59 samples/sec Loss 4.0013 LearningRate 0.0360 Epoch: 8 Global Step: 99470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:26,982-Speed 3304.46 samples/sec Loss 4.0172 LearningRate 0.0359 Epoch: 8 Global Step: 99480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:30,111-Speed 3273.37 samples/sec Loss 3.9441 LearningRate 0.0359 Epoch: 8 Global Step: 99490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:33,200-Speed 3315.76 samples/sec Loss 4.0096 LearningRate 0.0359 Epoch: 8 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:19:36,250-Speed 3358.84 samples/sec Loss 3.9805 LearningRate 0.0359 Epoch: 8 Global Step: 99510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:39,364-Speed 3289.12 samples/sec Loss 3.9907 LearningRate 0.0359 Epoch: 8 Global Step: 99520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:42,508-Speed 3258.27 samples/sec Loss 3.9445 LearningRate 0.0359 Epoch: 8 Global Step: 99530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:45,607-Speed 3304.83 samples/sec Loss 3.9679 LearningRate 0.0359 Epoch: 8 Global Step: 99540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:48,670-Speed 3344.49 samples/sec Loss 3.8933 LearningRate 0.0359 Epoch: 8 Global Step: 99550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:51,867-Speed 3204.03 samples/sec Loss 4.0291 LearningRate 0.0359 Epoch: 8 Global Step: 99560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:19:54,923-Speed 3352.49 samples/sec Loss 4.1155 LearningRate 0.0359 Epoch: 8 Global Step: 99570 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:19:57,988-Speed 3342.01 samples/sec Loss 3.9871 LearningRate 0.0359 Epoch: 8 Global Step: 99580 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:20:01,078-Speed 3314.81 samples/sec Loss 4.0706 LearningRate 0.0359 Epoch: 8 Global Step: 99590 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:20:04,268-Speed 3210.76 samples/sec Loss 4.1223 LearningRate 0.0359 Epoch: 8 Global Step: 99600 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:20:07,434-Speed 3235.47 samples/sec Loss 3.9885 LearningRate 0.0359 Epoch: 8 Global Step: 99610 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:20:10,503-Speed 3337.56 samples/sec Loss 3.9901 LearningRate 0.0359 Epoch: 8 Global Step: 99620 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:20:13,580-Speed 3328.32 samples/sec Loss 4.0306 LearningRate 0.0359 Epoch: 8 Global Step: 99630 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:20:16,651-Speed 3336.13 samples/sec Loss 4.0501 LearningRate 0.0359 Epoch: 8 Global Step: 99640 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:20:19,723-Speed 3334.74 samples/sec Loss 4.0325 LearningRate 0.0359 Epoch: 8 Global Step: 99650 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:20:22,804-Speed 3324.33 samples/sec Loss 3.9564 LearningRate 0.0359 Epoch: 8 Global Step: 99660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:20:25,878-Speed 3332.14 samples/sec Loss 4.1094 LearningRate 0.0359 Epoch: 8 Global Step: 99670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:20:28,973-Speed 3309.12 samples/sec Loss 4.0401 LearningRate 0.0358 Epoch: 8 Global Step: 99680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:20:32,095-Speed 3281.21 samples/sec Loss 4.0979 LearningRate 0.0358 Epoch: 8 Global Step: 99690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:20:35,228-Speed 3269.72 samples/sec Loss 4.0431 LearningRate 0.0358 Epoch: 8 Global Step: 99700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:20:38,318-Speed 3314.84 samples/sec Loss 4.0330 LearningRate 0.0358 Epoch: 8 Global Step: 99710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:20:41,408-Speed 3314.81 samples/sec Loss 4.0791 LearningRate 0.0358 Epoch: 8 Global Step: 99720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:20:44,517-Speed 3295.11 samples/sec Loss 4.1065 LearningRate 0.0358 Epoch: 8 Global Step: 99730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:20:47,641-Speed 3279.39 samples/sec Loss 3.9645 LearningRate 0.0358 Epoch: 8 Global Step: 99740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:20:50,771-Speed 3272.56 samples/sec Loss 4.0449 LearningRate 0.0358 Epoch: 8 Global Step: 99750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:20:53,918-Speed 3254.44 samples/sec Loss 4.0712 LearningRate 0.0358 Epoch: 8 Global Step: 99760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:20:57,007-Speed 3315.93 samples/sec Loss 4.0807 LearningRate 0.0358 Epoch: 8 Global Step: 99770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:21:00,135-Speed 3275.27 samples/sec Loss 4.0892 LearningRate 0.0358 Epoch: 8 Global Step: 99780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:03,295-Speed 3242.01 samples/sec Loss 4.0776 LearningRate 0.0358 Epoch: 8 Global Step: 99790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:06,406-Speed 3291.88 samples/sec Loss 4.0081 LearningRate 0.0358 Epoch: 8 Global Step: 99800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:09,476-Speed 3337.50 samples/sec Loss 4.1015 LearningRate 0.0358 Epoch: 8 Global Step: 99810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:12,534-Speed 3349.27 samples/sec Loss 4.0634 LearningRate 0.0358 Epoch: 8 Global Step: 99820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:15,624-Speed 3314.47 samples/sec Loss 4.0823 LearningRate 0.0358 Epoch: 8 Global Step: 99830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:18,735-Speed 3292.20 samples/sec Loss 4.1284 LearningRate 0.0358 Epoch: 8 Global Step: 99840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:21,803-Speed 3339.20 samples/sec Loss 4.1181 LearningRate 0.0358 Epoch: 8 Global Step: 99850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:24,878-Speed 3331.35 samples/sec Loss 4.0977 LearningRate 0.0358 Epoch: 8 Global Step: 99860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:27,982-Speed 3299.49 samples/sec Loss 4.1276 LearningRate 0.0358 Epoch: 8 Global Step: 99870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:31,088-Speed 3298.12 samples/sec Loss 4.0295 LearningRate 0.0358 Epoch: 8 Global Step: 99880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:21:34,214-Speed 3277.53 samples/sec Loss 4.0496 LearningRate 0.0357 Epoch: 8 Global Step: 99890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:37,328-Speed 3288.49 samples/sec Loss 4.1453 LearningRate 0.0357 Epoch: 8 Global Step: 99900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:40,461-Speed 3270.08 samples/sec Loss 4.1311 LearningRate 0.0357 Epoch: 8 Global Step: 99910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:43,650-Speed 3212.05 samples/sec Loss 3.9862 LearningRate 0.0357 Epoch: 8 Global Step: 99920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:46,746-Speed 3308.81 samples/sec Loss 4.1288 LearningRate 0.0357 Epoch: 8 Global Step: 99930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:49,819-Speed 3332.90 samples/sec Loss 4.1109 LearningRate 0.0357 Epoch: 8 Global Step: 99940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:52,918-Speed 3305.47 samples/sec Loss 4.0365 LearningRate 0.0357 Epoch: 8 Global Step: 99950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:56,018-Speed 3304.35 samples/sec Loss 4.1290 LearningRate 0.0357 Epoch: 8 Global Step: 99960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:21:59,074-Speed 3351.12 samples/sec Loss 4.2394 LearningRate 0.0357 Epoch: 8 Global Step: 99970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:02,198-Speed 3279.07 samples/sec Loss 4.1392 LearningRate 0.0357 Epoch: 8 Global Step: 99980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:05,301-Speed 3301.52 samples/sec Loss 4.1410 LearningRate 0.0357 Epoch: 8 Global Step: 99990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:08,397-Speed 3308.77 samples/sec Loss 4.1510 LearningRate 0.0357 Epoch: 8 Global Step: 100000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:11,530-Speed 3269.60 samples/sec Loss 4.1776 LearningRate 0.0357 Epoch: 8 Global Step: 100010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:14,716-Speed 3215.35 samples/sec Loss 4.1108 LearningRate 0.0357 Epoch: 8 Global Step: 100020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:17,857-Speed 3260.33 samples/sec Loss 4.1873 LearningRate 0.0357 Epoch: 8 Global Step: 100030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:20,937-Speed 3325.73 samples/sec Loss 4.1305 LearningRate 0.0357 Epoch: 8 Global Step: 100040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:24,028-Speed 3314.17 samples/sec Loss 4.0995 LearningRate 0.0357 Epoch: 8 Global Step: 100050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:27,133-Speed 3298.88 samples/sec Loss 4.1247 LearningRate 0.0357 Epoch: 8 Global Step: 100060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:30,211-Speed 3327.98 samples/sec Loss 4.1709 LearningRate 0.0357 Epoch: 8 Global Step: 100070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:33,283-Speed 3334.42 samples/sec Loss 4.2496 LearningRate 0.0357 Epoch: 8 Global Step: 100080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:22:36,415-Speed 3270.69 samples/sec Loss 4.1849 LearningRate 0.0357 Epoch: 8 Global Step: 100090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:22:39,582-Speed 3234.30 samples/sec Loss 4.1339 LearningRate 0.0356 Epoch: 8 Global Step: 100100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:22:42,786-Speed 3197.31 samples/sec Loss 4.0925 LearningRate 0.0356 Epoch: 8 Global Step: 100110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:22:45,878-Speed 3312.81 samples/sec Loss 4.1587 LearningRate 0.0356 Epoch: 8 Global Step: 100120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:22:49,032-Speed 3247.15 samples/sec Loss 4.1364 LearningRate 0.0356 Epoch: 8 Global Step: 100130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:22:52,124-Speed 3313.23 samples/sec Loss 4.0141 LearningRate 0.0356 Epoch: 8 Global Step: 100140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:22:55,285-Speed 3240.67 samples/sec Loss 4.1857 LearningRate 0.0356 Epoch: 8 Global Step: 100150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:22:58,407-Speed 3281.51 samples/sec Loss 4.1863 LearningRate 0.0356 Epoch: 8 Global Step: 100160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:23:01,485-Speed 3327.49 samples/sec Loss 4.2034 LearningRate 0.0356 Epoch: 8 Global Step: 100170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:23:04,557-Speed 3334.67 samples/sec Loss 4.2512 LearningRate 0.0356 Epoch: 8 Global Step: 100180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:23:07,713-Speed 3245.77 samples/sec Loss 4.1665 LearningRate 0.0356 Epoch: 8 Global Step: 100190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:23:10,805-Speed 3312.76 samples/sec Loss 4.1044 LearningRate 0.0356 Epoch: 8 Global Step: 100200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:23:13,959-Speed 3247.12 samples/sec Loss 4.2928 LearningRate 0.0356 Epoch: 8 Global Step: 100210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:23:17,144-Speed 3216.86 samples/sec Loss 4.2474 LearningRate 0.0356 Epoch: 8 Global Step: 100220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:23:20,227-Speed 3321.60 samples/sec Loss 4.1552 LearningRate 0.0356 Epoch: 8 Global Step: 100230 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:23:23,312-Speed 3320.15 samples/sec Loss 4.1870 LearningRate 0.0356 Epoch: 8 Global Step: 100240 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:23:26,460-Speed 3254.82 samples/sec Loss 4.3176 LearningRate 0.0356 Epoch: 8 Global Step: 100250 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:23:29,566-Speed 3298.11 samples/sec Loss 4.1442 LearningRate 0.0356 Epoch: 8 Global Step: 100260 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:23:32,640-Speed 3331.61 samples/sec Loss 4.0792 LearningRate 0.0356 Epoch: 8 Global Step: 100270 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:23:35,800-Speed 3242.17 samples/sec Loss 4.1793 LearningRate 0.0356 Epoch: 8 Global Step: 100280 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:23:38,889-Speed 3315.67 samples/sec Loss 4.1931 LearningRate 0.0356 Epoch: 8 Global Step: 100290 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:23:42,018-Speed 3273.61 samples/sec Loss 4.1499 LearningRate 0.0356 Epoch: 8 Global Step: 100300 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:23:45,093-Speed 3331.63 samples/sec Loss 4.2114 LearningRate 0.0355 Epoch: 8 Global Step: 100310 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:23:48,148-Speed 3352.50 samples/sec Loss 4.3432 LearningRate 0.0355 Epoch: 8 Global Step: 100320 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:23:51,213-Speed 3342.16 samples/sec Loss 4.3057 LearningRate 0.0355 Epoch: 8 Global Step: 100330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:23:54,354-Speed 3261.15 samples/sec Loss 4.2114 LearningRate 0.0355 Epoch: 8 Global Step: 100340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:23:57,444-Speed 3315.04 samples/sec Loss 4.2507 LearningRate 0.0355 Epoch: 8 Global Step: 100350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:24:00,504-Speed 3348.07 samples/sec Loss 4.2276 LearningRate 0.0355 Epoch: 8 Global Step: 100360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:24:03,596-Speed 3312.81 samples/sec Loss 4.2397 LearningRate 0.0355 Epoch: 8 Global Step: 100370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:24:06,702-Speed 3297.49 samples/sec Loss 4.1527 LearningRate 0.0355 Epoch: 8 Global Step: 100380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:24:09,794-Speed 3313.62 samples/sec Loss 4.1830 LearningRate 0.0355 Epoch: 8 Global Step: 100390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:24:12,878-Speed 3320.88 samples/sec Loss 4.2589 LearningRate 0.0355 Epoch: 8 Global Step: 100400 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:24:15,997-Speed 3284.01 samples/sec Loss 4.2895 LearningRate 0.0355 Epoch: 8 Global Step: 100410 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:24:19,109-Speed 3291.26 samples/sec Loss 4.2595 LearningRate 0.0355 Epoch: 8 Global Step: 100420 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:24:22,200-Speed 3314.13 samples/sec Loss 4.2148 LearningRate 0.0355 Epoch: 8 Global Step: 100430 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:24:25,368-Speed 3233.32 samples/sec Loss 4.2618 LearningRate 0.0355 Epoch: 8 Global Step: 100440 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:24:28,480-Speed 3291.27 samples/sec Loss 4.1622 LearningRate 0.0355 Epoch: 8 Global Step: 100450 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:24:31,550-Speed 3336.73 samples/sec Loss 4.1782 LearningRate 0.0355 Epoch: 8 Global Step: 100460 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:24:34,607-Speed 3350.99 samples/sec Loss 4.2292 LearningRate 0.0355 Epoch: 8 Global Step: 100470 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:24:37,762-Speed 3246.71 samples/sec Loss 4.1567 LearningRate 0.0355 Epoch: 8 Global Step: 100480 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:24:40,837-Speed 3330.80 samples/sec Loss 4.2252 LearningRate 0.0355 Epoch: 8 Global Step: 100490 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-27 10:24:43,939-Speed 3301.98 samples/sec Loss 4.4533 LearningRate 0.0355 Epoch: 8 Global Step: 100500 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-27 10:24:47,098-Speed 3242.97 samples/sec Loss 4.2820 LearningRate 0.0355 Epoch: 8 Global Step: 100510 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-27 10:24:50,168-Speed 3336.20 samples/sec Loss 4.2151 LearningRate 0.0354 Epoch: 8 Global Step: 100520 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-27 10:24:53,353-Speed 3216.40 samples/sec Loss 4.2765 LearningRate 0.0354 Epoch: 8 Global Step: 100530 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-27 10:24:56,446-Speed 3311.39 samples/sec Loss 4.2467 LearningRate 0.0354 Epoch: 8 Global Step: 100540 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-27 10:24:59,583-Speed 3265.33 samples/sec Loss 4.2179 LearningRate 0.0354 Epoch: 8 Global Step: 100550 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-27 10:25:02,714-Speed 3271.81 samples/sec Loss 4.3416 LearningRate 0.0354 Epoch: 8 Global Step: 100560 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-27 10:25:05,838-Speed 3278.84 samples/sec Loss 4.2893 LearningRate 0.0354 Epoch: 8 Global Step: 100570 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-27 10:25:08,949-Speed 3293.02 samples/sec Loss 4.2388 LearningRate 0.0354 Epoch: 8 Global Step: 100580 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-04-27 10:25:12,047-Speed 3305.89 samples/sec Loss 4.3066 LearningRate 0.0354 Epoch: 8 Global Step: 100590 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:25:15,148-Speed 3304.45 samples/sec Loss 4.2569 LearningRate 0.0354 Epoch: 8 Global Step: 100600 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:25:18,209-Speed 3345.94 samples/sec Loss 4.2806 LearningRate 0.0354 Epoch: 8 Global Step: 100610 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:25:21,325-Speed 3287.23 samples/sec Loss 4.3225 LearningRate 0.0354 Epoch: 8 Global Step: 100620 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:25:24,532-Speed 3193.75 samples/sec Loss 4.2211 LearningRate 0.0354 Epoch: 8 Global Step: 100630 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:25:27,708-Speed 3225.71 samples/sec Loss 4.2047 LearningRate 0.0354 Epoch: 8 Global Step: 100640 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:25:30,858-Speed 3251.23 samples/sec Loss 4.2328 LearningRate 0.0354 Epoch: 8 Global Step: 100650 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:25:33,938-Speed 3325.34 samples/sec Loss 4.2526 LearningRate 0.0354 Epoch: 8 Global Step: 100660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:25:37,094-Speed 3246.23 samples/sec Loss 4.2625 LearningRate 0.0354 Epoch: 8 Global Step: 100670 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:25:40,227-Speed 3269.72 samples/sec Loss 4.3428 LearningRate 0.0354 Epoch: 8 Global Step: 100680 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:25:43,419-Speed 3208.73 samples/sec Loss 4.2791 LearningRate 0.0354 Epoch: 8 Global Step: 100690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:25:46,510-Speed 3314.26 samples/sec Loss 4.3020 LearningRate 0.0354 Epoch: 8 Global Step: 100700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:25:49,652-Speed 3259.96 samples/sec Loss 4.2187 LearningRate 0.0354 Epoch: 8 Global Step: 100710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:25:52,803-Speed 3250.11 samples/sec Loss 4.3909 LearningRate 0.0353 Epoch: 8 Global Step: 100720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:25:55,908-Speed 3299.43 samples/sec Loss 4.2882 LearningRate 0.0353 Epoch: 8 Global Step: 100730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:25:58,984-Speed 3330.62 samples/sec Loss 4.2967 LearningRate 0.0353 Epoch: 8 Global Step: 100740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:02,140-Speed 3245.67 samples/sec Loss 4.2902 LearningRate 0.0353 Epoch: 8 Global Step: 100750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:05,225-Speed 3319.91 samples/sec Loss 4.3019 LearningRate 0.0353 Epoch: 8 Global Step: 100760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:08,288-Speed 3344.31 samples/sec Loss 4.2648 LearningRate 0.0353 Epoch: 8 Global Step: 100770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:11,377-Speed 3315.95 samples/sec Loss 4.2737 LearningRate 0.0353 Epoch: 8 Global Step: 100780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:14,472-Speed 3310.19 samples/sec Loss 4.3572 LearningRate 0.0353 Epoch: 8 Global Step: 100790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:26:17,567-Speed 3309.49 samples/sec Loss 4.2817 LearningRate 0.0353 Epoch: 8 Global Step: 100800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:26:20,647-Speed 3324.79 samples/sec Loss 4.3646 LearningRate 0.0353 Epoch: 8 Global Step: 100810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:23,713-Speed 3341.15 samples/sec Loss 4.2804 LearningRate 0.0353 Epoch: 8 Global Step: 100820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:26,804-Speed 3314.56 samples/sec Loss 4.3192 LearningRate 0.0353 Epoch: 8 Global Step: 100830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:29,968-Speed 3237.20 samples/sec Loss 4.2754 LearningRate 0.0353 Epoch: 8 Global Step: 100840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:33,057-Speed 3315.94 samples/sec Loss 4.2997 LearningRate 0.0353 Epoch: 8 Global Step: 100850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:36,248-Speed 3209.91 samples/sec Loss 4.3510 LearningRate 0.0353 Epoch: 8 Global Step: 100860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:39,418-Speed 3230.66 samples/sec Loss 4.3395 LearningRate 0.0353 Epoch: 8 Global Step: 100870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:42,539-Speed 3282.33 samples/sec Loss 4.2751 LearningRate 0.0353 Epoch: 8 Global Step: 100880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:45,624-Speed 3320.53 samples/sec Loss 4.3408 LearningRate 0.0353 Epoch: 8 Global Step: 100890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:48,793-Speed 3231.67 samples/sec Loss 4.3584 LearningRate 0.0353 Epoch: 8 Global Step: 100900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:26:51,909-Speed 3287.74 samples/sec Loss 4.3743 LearningRate 0.0353 Epoch: 8 Global Step: 100910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:26:55,092-Speed 3218.53 samples/sec Loss 4.3693 LearningRate 0.0353 Epoch: 8 Global Step: 100920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:26:58,130-Speed 3370.90 samples/sec Loss 4.3171 LearningRate 0.0352 Epoch: 8 Global Step: 100930 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:27:01,252-Speed 3280.70 samples/sec Loss 4.2990 LearningRate 0.0352 Epoch: 8 Global Step: 100940 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:27:04,336-Speed 3322.07 samples/sec Loss 4.3385 LearningRate 0.0352 Epoch: 8 Global Step: 100950 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:27:07,518-Speed 3218.81 samples/sec Loss 4.3060 LearningRate 0.0352 Epoch: 8 Global Step: 100960 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:27:10,625-Speed 3296.24 samples/sec Loss 4.4000 LearningRate 0.0352 Epoch: 8 Global Step: 100970 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:27:13,772-Speed 3254.85 samples/sec Loss 4.2863 LearningRate 0.0352 Epoch: 8 Global Step: 100980 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:27:16,868-Speed 3308.72 samples/sec Loss 4.3171 LearningRate 0.0352 Epoch: 8 Global Step: 100990 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:27:19,956-Speed 3317.65 samples/sec Loss 4.4145 LearningRate 0.0352 Epoch: 8 Global Step: 101000 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:27:23,018-Speed 3344.99 samples/sec Loss 4.3590 LearningRate 0.0352 Epoch: 8 Global Step: 101010 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:27:26,092-Speed 3332.05 samples/sec Loss 4.3036 LearningRate 0.0352 Epoch: 8 Global Step: 101020 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:27:29,169-Speed 3328.85 samples/sec Loss 4.4496 LearningRate 0.0352 Epoch: 8 Global Step: 101030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:27:32,251-Speed 3324.20 samples/sec Loss 4.3785 LearningRate 0.0352 Epoch: 8 Global Step: 101040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:27:35,406-Speed 3246.08 samples/sec Loss 4.2895 LearningRate 0.0352 Epoch: 8 Global Step: 101050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:27:38,552-Speed 3255.75 samples/sec Loss 4.3030 LearningRate 0.0352 Epoch: 8 Global Step: 101060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:27:41,649-Speed 3308.23 samples/sec Loss 4.3187 LearningRate 0.0352 Epoch: 8 Global Step: 101070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:27:44,731-Speed 3323.57 samples/sec Loss 4.3490 LearningRate 0.0352 Epoch: 8 Global Step: 101080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:27:47,851-Speed 3282.50 samples/sec Loss 4.3840 LearningRate 0.0352 Epoch: 8 Global Step: 101090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:27:51,092-Speed 3160.94 samples/sec Loss 4.3801 LearningRate 0.0352 Epoch: 8 Global Step: 101100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:27:54,246-Speed 3247.46 samples/sec Loss 4.3385 LearningRate 0.0352 Epoch: 8 Global Step: 101110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:27:57,337-Speed 3313.96 samples/sec Loss 4.3365 LearningRate 0.0352 Epoch: 8 Global Step: 101120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:28:00,514-Speed 3224.07 samples/sec Loss 4.3487 LearningRate 0.0352 Epoch: 8 Global Step: 101130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:28:03,692-Speed 3223.74 samples/sec Loss 4.3464 LearningRate 0.0351 Epoch: 8 Global Step: 101140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:28:06,886-Speed 3206.88 samples/sec Loss 4.3792 LearningRate 0.0351 Epoch: 8 Global Step: 101150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:28:09,955-Speed 3337.77 samples/sec Loss 4.3660 LearningRate 0.0351 Epoch: 8 Global Step: 101160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:28:13,011-Speed 3351.69 samples/sec Loss 4.4209 LearningRate 0.0351 Epoch: 8 Global Step: 101170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:28:16,211-Speed 3201.11 samples/sec Loss 4.4275 LearningRate 0.0351 Epoch: 8 Global Step: 101180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:28:19,316-Speed 3298.83 samples/sec Loss 4.2504 LearningRate 0.0351 Epoch: 8 Global Step: 101190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:28:22,412-Speed 3308.83 samples/sec Loss 4.3485 LearningRate 0.0351 Epoch: 8 Global Step: 101200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:28:25,529-Speed 3286.44 samples/sec Loss 4.3731 LearningRate 0.0351 Epoch: 8 Global Step: 101210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:28:28,728-Speed 3201.79 samples/sec Loss 4.3516 LearningRate 0.0351 Epoch: 8 Global Step: 101220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:28:31,818-Speed 3315.33 samples/sec Loss 4.3552 LearningRate 0.0351 Epoch: 8 Global Step: 101230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:28:34,928-Speed 3294.14 samples/sec Loss 4.3796 LearningRate 0.0351 Epoch: 8 Global Step: 101240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:28:38,023-Speed 3309.20 samples/sec Loss 4.2667 LearningRate 0.0351 Epoch: 8 Global Step: 101250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:28:41,105-Speed 3323.63 samples/sec Loss 4.4227 LearningRate 0.0351 Epoch: 8 Global Step: 101260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:28:44,208-Speed 3301.61 samples/sec Loss 4.3903 LearningRate 0.0351 Epoch: 8 Global Step: 101270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:28:47,277-Speed 3336.64 samples/sec Loss 4.3816 LearningRate 0.0351 Epoch: 8 Global Step: 101280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:28:50,367-Speed 3316.04 samples/sec Loss 4.4185 LearningRate 0.0351 Epoch: 8 Global Step: 101290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:28:53,434-Speed 3339.60 samples/sec Loss 4.5434 LearningRate 0.0351 Epoch: 8 Global Step: 101300 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:28:56,500-Speed 3340.84 samples/sec Loss 4.4285 LearningRate 0.0351 Epoch: 8 Global Step: 101310 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:28:59,651-Speed 3252.18 samples/sec Loss 4.4279 LearningRate 0.0351 Epoch: 8 Global Step: 101320 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:29:02,781-Speed 3272.47 samples/sec Loss 4.4453 LearningRate 0.0351 Epoch: 8 Global Step: 101330 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:29:05,879-Speed 3306.73 samples/sec Loss 4.4307 LearningRate 0.0351 Epoch: 8 Global Step: 101340 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:29:08,942-Speed 3344.17 samples/sec Loss 4.3622 LearningRate 0.0350 Epoch: 8 Global Step: 101350 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:29:12,025-Speed 3323.15 samples/sec Loss 4.5010 LearningRate 0.0350 Epoch: 8 Global Step: 101360 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:29:15,095-Speed 3336.56 samples/sec Loss 4.3219 LearningRate 0.0350 Epoch: 8 Global Step: 101370 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:29:18,255-Speed 3241.55 samples/sec Loss 4.4356 LearningRate 0.0350 Epoch: 8 Global Step: 101380 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:29:21,309-Speed 3353.66 samples/sec Loss 4.4041 LearningRate 0.0350 Epoch: 8 Global Step: 101390 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:29:24,461-Speed 3249.42 samples/sec Loss 4.5370 LearningRate 0.0350 Epoch: 8 Global Step: 101400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:29:27,602-Speed 3261.48 samples/sec Loss 4.3870 LearningRate 0.0350 Epoch: 8 Global Step: 101410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:29:30,789-Speed 3213.74 samples/sec Loss 4.4095 LearningRate 0.0350 Epoch: 8 Global Step: 101420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:29:33,900-Speed 3293.05 samples/sec Loss 4.4756 LearningRate 0.0350 Epoch: 8 Global Step: 101430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:29:36,993-Speed 3311.80 samples/sec Loss 4.3050 LearningRate 0.0350 Epoch: 8 Global Step: 101440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:29:40,069-Speed 3330.34 samples/sec Loss 4.4669 LearningRate 0.0350 Epoch: 8 Global Step: 101450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:29:43,162-Speed 3311.53 samples/sec Loss 4.6072 LearningRate 0.0350 Epoch: 8 Global Step: 101460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:29:46,260-Speed 3306.63 samples/sec Loss 4.4922 LearningRate 0.0350 Epoch: 8 Global Step: 101470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:29:49,358-Speed 3305.95 samples/sec Loss 4.3451 LearningRate 0.0350 Epoch: 8 Global Step: 101480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:29:52,531-Speed 3228.33 samples/sec Loss 4.4352 LearningRate 0.0350 Epoch: 8 Global Step: 101490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:29:55,594-Speed 3344.55 samples/sec Loss 4.4883 LearningRate 0.0350 Epoch: 8 Global Step: 101500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:29:58,701-Speed 3296.69 samples/sec Loss 4.4866 LearningRate 0.0350 Epoch: 8 Global Step: 101510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:30:01,783-Speed 3323.51 samples/sec Loss 4.4344 LearningRate 0.0350 Epoch: 8 Global Step: 101520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:04,872-Speed 3316.66 samples/sec Loss 4.4024 LearningRate 0.0350 Epoch: 8 Global Step: 101530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:08,058-Speed 3214.80 samples/sec Loss 4.4893 LearningRate 0.0350 Epoch: 8 Global Step: 101540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:11,170-Speed 3291.35 samples/sec Loss 4.4274 LearningRate 0.0350 Epoch: 8 Global Step: 101550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:14,303-Speed 3269.70 samples/sec Loss 4.5283 LearningRate 0.0349 Epoch: 8 Global Step: 101560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:17,423-Speed 3283.10 samples/sec Loss 4.3755 LearningRate 0.0349 Epoch: 8 Global Step: 101570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:20,504-Speed 3325.37 samples/sec Loss 4.3768 LearningRate 0.0349 Epoch: 8 Global Step: 101580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:23,592-Speed 3317.13 samples/sec Loss 4.4450 LearningRate 0.0349 Epoch: 8 Global Step: 101590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:26,694-Speed 3302.22 samples/sec Loss 4.4091 LearningRate 0.0349 Epoch: 8 Global Step: 101600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:29,793-Speed 3305.16 samples/sec Loss 4.4275 LearningRate 0.0349 Epoch: 8 Global Step: 101610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:32,894-Speed 3303.31 samples/sec Loss 4.4021 LearningRate 0.0349 Epoch: 8 Global Step: 101620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:30:36,057-Speed 3238.18 samples/sec Loss 4.5256 LearningRate 0.0349 Epoch: 8 Global Step: 101630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:39,184-Speed 3275.90 samples/sec Loss 4.3146 LearningRate 0.0349 Epoch: 8 Global Step: 101640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:42,296-Speed 3291.65 samples/sec Loss 4.4896 LearningRate 0.0349 Epoch: 8 Global Step: 101650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:45,402-Speed 3298.34 samples/sec Loss 4.5004 LearningRate 0.0349 Epoch: 8 Global Step: 101660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:48,534-Speed 3270.81 samples/sec Loss 4.4322 LearningRate 0.0349 Epoch: 8 Global Step: 101670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:51,651-Speed 3285.57 samples/sec Loss 4.4949 LearningRate 0.0349 Epoch: 8 Global Step: 101680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:30:54,696-Speed 3364.18 samples/sec Loss 4.4620 LearningRate 0.0349 Epoch: 8 Global Step: 101690 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:30:57,783-Speed 3318.02 samples/sec Loss 4.4583 LearningRate 0.0349 Epoch: 8 Global Step: 101700 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:31:01,001-Speed 3183.25 samples/sec Loss 4.4989 LearningRate 0.0349 Epoch: 8 Global Step: 101710 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:31:04,116-Speed 3288.74 samples/sec Loss 4.4996 LearningRate 0.0349 Epoch: 8 Global Step: 101720 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:31:07,189-Speed 3333.71 samples/sec Loss 4.5533 LearningRate 0.0349 Epoch: 8 Global Step: 101730 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:31:10,258-Speed 3337.18 samples/sec Loss 4.5453 LearningRate 0.0349 Epoch: 8 Global Step: 101740 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:31:13,374-Speed 3287.27 samples/sec Loss 4.4911 LearningRate 0.0349 Epoch: 8 Global Step: 101750 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:31:16,486-Speed 3292.14 samples/sec Loss 4.5375 LearningRate 0.0349 Epoch: 8 Global Step: 101760 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:31:19,610-Speed 3278.33 samples/sec Loss 4.3993 LearningRate 0.0348 Epoch: 8 Global Step: 101770 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:31:22,684-Speed 3331.50 samples/sec Loss 4.5237 LearningRate 0.0348 Epoch: 8 Global Step: 101780 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:31:25,807-Speed 3280.90 samples/sec Loss 4.5179 LearningRate 0.0348 Epoch: 8 Global Step: 101790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:31:28,959-Speed 3249.98 samples/sec Loss 4.3616 LearningRate 0.0348 Epoch: 8 Global Step: 101800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:31:32,059-Speed 3304.33 samples/sec Loss 4.4223 LearningRate 0.0348 Epoch: 8 Global Step: 101810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:31:35,158-Speed 3305.07 samples/sec Loss 4.5534 LearningRate 0.0348 Epoch: 8 Global Step: 101820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:31:38,238-Speed 3326.00 samples/sec Loss 4.4696 LearningRate 0.0348 Epoch: 8 Global Step: 101830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:31:41,315-Speed 3328.37 samples/sec Loss 4.5882 LearningRate 0.0348 Epoch: 8 Global Step: 101840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:31:44,398-Speed 3322.50 samples/sec Loss 4.5067 LearningRate 0.0348 Epoch: 8 Global Step: 101850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:31:47,578-Speed 3220.95 samples/sec Loss 4.5549 LearningRate 0.0348 Epoch: 8 Global Step: 101860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:31:50,733-Speed 3246.86 samples/sec Loss 4.5152 LearningRate 0.0348 Epoch: 8 Global Step: 101870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:31:53,887-Speed 3247.45 samples/sec Loss 4.5022 LearningRate 0.0348 Epoch: 8 Global Step: 101880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:31:56,984-Speed 3307.62 samples/sec Loss 4.4392 LearningRate 0.0348 Epoch: 8 Global Step: 101890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:32:00,069-Speed 3320.62 samples/sec Loss 4.4415 LearningRate 0.0348 Epoch: 8 Global Step: 101900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:32:03,177-Speed 3295.54 samples/sec Loss 4.4657 LearningRate 0.0348 Epoch: 8 Global Step: 101910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:06,283-Speed 3299.06 samples/sec Loss 4.5275 LearningRate 0.0348 Epoch: 8 Global Step: 101920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:09,400-Speed 3286.46 samples/sec Loss 4.6317 LearningRate 0.0348 Epoch: 8 Global Step: 101930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:12,553-Speed 3249.11 samples/sec Loss 4.4299 LearningRate 0.0348 Epoch: 8 Global Step: 101940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:15,643-Speed 3314.30 samples/sec Loss 4.5406 LearningRate 0.0348 Epoch: 8 Global Step: 101950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:18,755-Speed 3292.24 samples/sec Loss 4.4639 LearningRate 0.0348 Epoch: 8 Global Step: 101960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:21,827-Speed 3334.26 samples/sec Loss 4.4982 LearningRate 0.0348 Epoch: 8 Global Step: 101970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:24,920-Speed 3311.82 samples/sec Loss 4.5486 LearningRate 0.0347 Epoch: 8 Global Step: 101980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:28,047-Speed 3275.97 samples/sec Loss 4.4800 LearningRate 0.0347 Epoch: 8 Global Step: 101990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:31,155-Speed 3295.99 samples/sec Loss 4.4989 LearningRate 0.0347 Epoch: 8 Global Step: 102000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:34,230-Speed 3331.14 samples/sec Loss 4.4168 LearningRate 0.0347 Epoch: 8 Global Step: 102010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:32:37,326-Speed 3307.97 samples/sec Loss 4.5291 LearningRate 0.0347 Epoch: 8 Global Step: 102020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:40,409-Speed 3322.34 samples/sec Loss 4.4602 LearningRate 0.0347 Epoch: 8 Global Step: 102030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:43,471-Speed 3345.84 samples/sec Loss 4.4666 LearningRate 0.0347 Epoch: 8 Global Step: 102040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:46,568-Speed 3307.41 samples/sec Loss 4.5392 LearningRate 0.0347 Epoch: 8 Global Step: 102050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:49,672-Speed 3299.95 samples/sec Loss 4.5503 LearningRate 0.0347 Epoch: 8 Global Step: 102060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:52,781-Speed 3294.83 samples/sec Loss 4.4417 LearningRate 0.0347 Epoch: 8 Global Step: 102070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:55,952-Speed 3230.30 samples/sec Loss 4.3787 LearningRate 0.0347 Epoch: 8 Global Step: 102080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:32:59,046-Speed 3310.65 samples/sec Loss 4.5295 LearningRate 0.0347 Epoch: 8 Global Step: 102090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:02,140-Speed 3310.99 samples/sec Loss 4.6324 LearningRate 0.0347 Epoch: 8 Global Step: 102100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:05,243-Speed 3300.59 samples/sec Loss 4.5079 LearningRate 0.0347 Epoch: 8 Global Step: 102110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:08,380-Speed 3265.80 samples/sec Loss 4.5894 LearningRate 0.0347 Epoch: 8 Global Step: 102120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:33:11,543-Speed 3237.56 samples/sec Loss 4.6007 LearningRate 0.0347 Epoch: 8 Global Step: 102130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:33:14,637-Speed 3311.44 samples/sec Loss 4.5649 LearningRate 0.0347 Epoch: 8 Global Step: 102140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:33:17,756-Speed 3283.26 samples/sec Loss 4.5704 LearningRate 0.0347 Epoch: 8 Global Step: 102150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:33:20,858-Speed 3302.57 samples/sec Loss 4.5439 LearningRate 0.0347 Epoch: 8 Global Step: 102160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:33:23,995-Speed 3264.59 samples/sec Loss 4.5744 LearningRate 0.0347 Epoch: 8 Global Step: 102170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:33:27,094-Speed 3306.06 samples/sec Loss 4.5224 LearningRate 0.0347 Epoch: 8 Global Step: 102180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:30,233-Speed 3263.54 samples/sec Loss 4.4865 LearningRate 0.0346 Epoch: 8 Global Step: 102190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:33,375-Speed 3259.57 samples/sec Loss 4.5636 LearningRate 0.0346 Epoch: 8 Global Step: 102200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:36,526-Speed 3250.90 samples/sec Loss 4.4601 LearningRate 0.0346 Epoch: 8 Global Step: 102210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:39,665-Speed 3262.55 samples/sec Loss 4.5376 LearningRate 0.0346 Epoch: 8 Global Step: 102220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:42,773-Speed 3296.24 samples/sec Loss 4.5342 LearningRate 0.0346 Epoch: 8 Global Step: 102230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:45,876-Speed 3301.52 samples/sec Loss 4.5486 LearningRate 0.0346 Epoch: 8 Global Step: 102240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:49,044-Speed 3232.82 samples/sec Loss 4.5963 LearningRate 0.0346 Epoch: 8 Global Step: 102250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:52,154-Speed 3293.68 samples/sec Loss 4.5941 LearningRate 0.0346 Epoch: 8 Global Step: 102260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:55,307-Speed 3248.78 samples/sec Loss 4.4866 LearningRate 0.0346 Epoch: 8 Global Step: 102270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:33:58,379-Speed 3334.42 samples/sec Loss 4.5843 LearningRate 0.0346 Epoch: 8 Global Step: 102280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:34:01,519-Speed 3262.89 samples/sec Loss 4.5485 LearningRate 0.0346 Epoch: 8 Global Step: 102290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:34:04,581-Speed 3345.13 samples/sec Loss 4.5498 LearningRate 0.0346 Epoch: 8 Global Step: 102300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:34:07,652-Speed 3334.96 samples/sec Loss 4.5574 LearningRate 0.0346 Epoch: 8 Global Step: 102310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:10,725-Speed 3333.39 samples/sec Loss 4.5327 LearningRate 0.0346 Epoch: 8 Global Step: 102320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:13,933-Speed 3193.03 samples/sec Loss 4.6138 LearningRate 0.0346 Epoch: 8 Global Step: 102330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:17,120-Speed 3214.03 samples/sec Loss 4.5973 LearningRate 0.0346 Epoch: 8 Global Step: 102340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:20,227-Speed 3297.19 samples/sec Loss 4.5240 LearningRate 0.0346 Epoch: 8 Global Step: 102350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:23,374-Speed 3254.50 samples/sec Loss 4.5334 LearningRate 0.0346 Epoch: 8 Global Step: 102360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:26,436-Speed 3345.17 samples/sec Loss 4.4757 LearningRate 0.0346 Epoch: 8 Global Step: 102370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:29,613-Speed 3223.85 samples/sec Loss 4.5604 LearningRate 0.0346 Epoch: 8 Global Step: 102380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:32,705-Speed 3313.36 samples/sec Loss 4.5553 LearningRate 0.0346 Epoch: 8 Global Step: 102390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:35,883-Speed 3223.79 samples/sec Loss 4.5469 LearningRate 0.0346 Epoch: 8 Global Step: 102400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:39,003-Speed 3281.92 samples/sec Loss 4.5401 LearningRate 0.0345 Epoch: 8 Global Step: 102410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:34:42,112-Speed 3294.67 samples/sec Loss 4.5302 LearningRate 0.0345 Epoch: 8 Global Step: 102420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:45,192-Speed 3327.24 samples/sec Loss 4.6033 LearningRate 0.0345 Epoch: 8 Global Step: 102430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:48,348-Speed 3245.18 samples/sec Loss 4.5392 LearningRate 0.0345 Epoch: 8 Global Step: 102440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:51,479-Speed 3271.31 samples/sec Loss 4.5597 LearningRate 0.0345 Epoch: 8 Global Step: 102450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:54,631-Speed 3250.24 samples/sec Loss 4.5888 LearningRate 0.0345 Epoch: 8 Global Step: 102460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:34:57,729-Speed 3306.52 samples/sec Loss 4.5506 LearningRate 0.0345 Epoch: 8 Global Step: 102470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:00,869-Speed 3262.10 samples/sec Loss 4.6134 LearningRate 0.0345 Epoch: 8 Global Step: 102480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:03,959-Speed 3314.44 samples/sec Loss 4.5439 LearningRate 0.0345 Epoch: 8 Global Step: 102490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:07,039-Speed 3326.66 samples/sec Loss 4.5673 LearningRate 0.0345 Epoch: 8 Global Step: 102500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:10,127-Speed 3316.82 samples/sec Loss 4.6349 LearningRate 0.0345 Epoch: 8 Global Step: 102510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:13,217-Speed 3314.90 samples/sec Loss 4.6472 LearningRate 0.0345 Epoch: 8 Global Step: 102520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:35:16,398-Speed 3220.83 samples/sec Loss 4.5491 LearningRate 0.0345 Epoch: 8 Global Step: 102530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:19,473-Speed 3330.75 samples/sec Loss 4.6338 LearningRate 0.0345 Epoch: 8 Global Step: 102540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:22,552-Speed 3326.89 samples/sec Loss 4.5110 LearningRate 0.0345 Epoch: 8 Global Step: 102550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:25,623-Speed 3335.54 samples/sec Loss 4.5941 LearningRate 0.0345 Epoch: 8 Global Step: 102560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:28,763-Speed 3262.52 samples/sec Loss 4.6598 LearningRate 0.0345 Epoch: 8 Global Step: 102570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:31,904-Speed 3261.82 samples/sec Loss 4.6708 LearningRate 0.0345 Epoch: 8 Global Step: 102580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:35,049-Speed 3256.85 samples/sec Loss 4.6129 LearningRate 0.0345 Epoch: 8 Global Step: 102590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:38,140-Speed 3313.22 samples/sec Loss 4.5855 LearningRate 0.0345 Epoch: 8 Global Step: 102600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:35:41,255-Speed 3288.27 samples/sec Loss 4.6240 LearningRate 0.0345 Epoch: 8 Global Step: 102610 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:35:44,332-Speed 3329.12 samples/sec Loss 4.5366 LearningRate 0.0344 Epoch: 8 Global Step: 102620 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:35:47,398-Speed 3341.62 samples/sec Loss 4.5494 LearningRate 0.0344 Epoch: 8 Global Step: 102630 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:35:50,544-Speed 3255.70 samples/sec Loss 4.5203 LearningRate 0.0344 Epoch: 8 Global Step: 102640 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:35:53,639-Speed 3310.06 samples/sec Loss 4.6153 LearningRate 0.0344 Epoch: 8 Global Step: 102650 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:35:56,716-Speed 3328.27 samples/sec Loss 4.6203 LearningRate 0.0344 Epoch: 8 Global Step: 102660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:35:59,808-Speed 3313.26 samples/sec Loss 4.5645 LearningRate 0.0344 Epoch: 8 Global Step: 102670 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:36:02,925-Speed 3286.58 samples/sec Loss 4.5845 LearningRate 0.0344 Epoch: 8 Global Step: 102680 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:36:06,035-Speed 3293.52 samples/sec Loss 4.5166 LearningRate 0.0344 Epoch: 8 Global Step: 102690 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:36:09,128-Speed 3311.50 samples/sec Loss 4.6140 LearningRate 0.0344 Epoch: 8 Global Step: 102700 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 10:36:12,199-Speed 3335.33 samples/sec Loss 4.6680 LearningRate 0.0344 Epoch: 8 Global Step: 102710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:15,335-Speed 3266.55 samples/sec Loss 4.6267 LearningRate 0.0344 Epoch: 8 Global Step: 102720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:18,419-Speed 3321.40 samples/sec Loss 4.5571 LearningRate 0.0344 Epoch: 8 Global Step: 102730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:21,479-Speed 3347.81 samples/sec Loss 4.5262 LearningRate 0.0344 Epoch: 8 Global Step: 102740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:24,583-Speed 3299.54 samples/sec Loss 4.5477 LearningRate 0.0344 Epoch: 8 Global Step: 102750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:27,659-Speed 3330.40 samples/sec Loss 4.6393 LearningRate 0.0344 Epoch: 8 Global Step: 102760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:30,748-Speed 3316.02 samples/sec Loss 4.5375 LearningRate 0.0344 Epoch: 8 Global Step: 102770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:33,837-Speed 3315.69 samples/sec Loss 4.6641 LearningRate 0.0344 Epoch: 8 Global Step: 102780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:36,961-Speed 3279.25 samples/sec Loss 4.5856 LearningRate 0.0344 Epoch: 8 Global Step: 102790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:40,087-Speed 3276.94 samples/sec Loss 4.4936 LearningRate 0.0344 Epoch: 8 Global Step: 102800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:43,200-Speed 3289.61 samples/sec Loss 4.6028 LearningRate 0.0344 Epoch: 8 Global Step: 102810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 10:36:46,250-Speed 3359.70 samples/sec Loss 4.5381 LearningRate 0.0344 Epoch: 8 Global Step: 102820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:49,330-Speed 3325.11 samples/sec Loss 4.5618 LearningRate 0.0343 Epoch: 8 Global Step: 102830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:52,461-Speed 3271.52 samples/sec Loss 4.5938 LearningRate 0.0343 Epoch: 8 Global Step: 102840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:55,533-Speed 3334.15 samples/sec Loss 4.7107 LearningRate 0.0343 Epoch: 8 Global Step: 102850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:36:58,616-Speed 3322.85 samples/sec Loss 4.6234 LearningRate 0.0343 Epoch: 8 Global Step: 102860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:37:01,761-Speed 3256.97 samples/sec Loss 4.5711 LearningRate 0.0343 Epoch: 8 Global Step: 102870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:37:04,852-Speed 3314.10 samples/sec Loss 4.5956 LearningRate 0.0343 Epoch: 8 Global Step: 102880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 10:37:08,001-Speed 3253.12 samples/sec Loss 4.6073 LearningRate 0.0343 Epoch: 8 Global Step: 102890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:11,073-Speed 3334.30 samples/sec Loss 4.5781 LearningRate 0.0343 Epoch: 8 Global Step: 102900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:14,271-Speed 3202.90 samples/sec Loss 4.6293 LearningRate 0.0343 Epoch: 8 Global Step: 102910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:17,427-Speed 3245.84 samples/sec Loss 4.6210 LearningRate 0.0343 Epoch: 8 Global Step: 102920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:37:20,517-Speed 3314.52 samples/sec Loss 4.5752 LearningRate 0.0343 Epoch: 8 Global Step: 102930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:37:23,631-Speed 3289.49 samples/sec Loss 4.7203 LearningRate 0.0343 Epoch: 8 Global Step: 102940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:37:26,786-Speed 3247.04 samples/sec Loss 4.4796 LearningRate 0.0343 Epoch: 8 Global Step: 102950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:29,873-Speed 3318.53 samples/sec Loss 4.6123 LearningRate 0.0343 Epoch: 8 Global Step: 102960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:32,945-Speed 3333.72 samples/sec Loss 4.6654 LearningRate 0.0343 Epoch: 8 Global Step: 102970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:36,008-Speed 3344.44 samples/sec Loss 4.6680 LearningRate 0.0343 Epoch: 8 Global Step: 102980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:39,105-Speed 3307.93 samples/sec Loss 4.5884 LearningRate 0.0343 Epoch: 8 Global Step: 102990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:42,256-Speed 3250.76 samples/sec Loss 4.5993 LearningRate 0.0343 Epoch: 8 Global Step: 103000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:45,306-Speed 3358.33 samples/sec Loss 4.6234 LearningRate 0.0343 Epoch: 8 Global Step: 103010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:48,403-Speed 3307.43 samples/sec Loss 4.6321 LearningRate 0.0343 Epoch: 8 Global Step: 103020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:51,581-Speed 3223.38 samples/sec Loss 4.4934 LearningRate 0.0343 Epoch: 8 Global Step: 103030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:54,719-Speed 3265.11 samples/sec Loss 4.5510 LearningRate 0.0342 Epoch: 8 Global Step: 103040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:37:57,843-Speed 3278.20 samples/sec Loss 4.6291 LearningRate 0.0342 Epoch: 8 Global Step: 103050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:38:00,932-Speed 3316.41 samples/sec Loss 4.5894 LearningRate 0.0342 Epoch: 8 Global Step: 103060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:38:04,020-Speed 3316.26 samples/sec Loss 4.5306 LearningRate 0.0342 Epoch: 8 Global Step: 103070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:07,155-Speed 3268.22 samples/sec Loss 4.5819 LearningRate 0.0342 Epoch: 8 Global Step: 103080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:10,204-Speed 3358.94 samples/sec Loss 4.6445 LearningRate 0.0342 Epoch: 8 Global Step: 103090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:13,270-Speed 3341.37 samples/sec Loss 4.6628 LearningRate 0.0342 Epoch: 8 Global Step: 103100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:16,395-Speed 3277.80 samples/sec Loss 4.6946 LearningRate 0.0342 Epoch: 8 Global Step: 103110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:19,540-Speed 3256.83 samples/sec Loss 4.5976 LearningRate 0.0342 Epoch: 8 Global Step: 103120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:22,605-Speed 3341.95 samples/sec Loss 4.6095 LearningRate 0.0342 Epoch: 8 Global Step: 103130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:25,744-Speed 3263.73 samples/sec Loss 4.6328 LearningRate 0.0342 Epoch: 8 Global Step: 103140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:28,828-Speed 3321.15 samples/sec Loss 4.6141 LearningRate 0.0342 Epoch: 8 Global Step: 103150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:31,963-Speed 3267.38 samples/sec Loss 4.6852 LearningRate 0.0342 Epoch: 8 Global Step: 103160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:35,038-Speed 3330.40 samples/sec Loss 4.7425 LearningRate 0.0342 Epoch: 8 Global Step: 103170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:38:38,115-Speed 3328.86 samples/sec Loss 4.6568 LearningRate 0.0342 Epoch: 8 Global Step: 103180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:38:41,254-Speed 3263.52 samples/sec Loss 4.6022 LearningRate 0.0342 Epoch: 8 Global Step: 103190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:44,352-Speed 3306.70 samples/sec Loss 4.6089 LearningRate 0.0342 Epoch: 8 Global Step: 103200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:47,456-Speed 3299.93 samples/sec Loss 4.6360 LearningRate 0.0342 Epoch: 8 Global Step: 103210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:50,548-Speed 3312.77 samples/sec Loss 4.6309 LearningRate 0.0342 Epoch: 8 Global Step: 103220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:53,638-Speed 3314.51 samples/sec Loss 4.7416 LearningRate 0.0342 Epoch: 8 Global Step: 103230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:56,720-Speed 3323.82 samples/sec Loss 4.6063 LearningRate 0.0342 Epoch: 8 Global Step: 103240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:38:59,824-Speed 3300.42 samples/sec Loss 4.6465 LearningRate 0.0341 Epoch: 8 Global Step: 103250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:02,990-Speed 3234.54 samples/sec Loss 4.6552 LearningRate 0.0341 Epoch: 8 Global Step: 103260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:06,169-Speed 3223.49 samples/sec Loss 4.6063 LearningRate 0.0341 Epoch: 8 Global Step: 103270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:09,235-Speed 3340.76 samples/sec Loss 4.5641 LearningRate 0.0341 Epoch: 8 Global Step: 103280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:12,363-Speed 3274.18 samples/sec Loss 4.6278 LearningRate 0.0341 Epoch: 8 Global Step: 103290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:39:15,492-Speed 3273.77 samples/sec Loss 4.6141 LearningRate 0.0341 Epoch: 8 Global Step: 103300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:18,645-Speed 3249.21 samples/sec Loss 4.7499 LearningRate 0.0341 Epoch: 8 Global Step: 103310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:21,696-Speed 3357.81 samples/sec Loss 4.6273 LearningRate 0.0341 Epoch: 8 Global Step: 103320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:24,881-Speed 3215.86 samples/sec Loss 4.5904 LearningRate 0.0341 Epoch: 8 Global Step: 103330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:28,012-Speed 3270.85 samples/sec Loss 4.6815 LearningRate 0.0341 Epoch: 8 Global Step: 103340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:31,212-Speed 3201.76 samples/sec Loss 4.5592 LearningRate 0.0341 Epoch: 8 Global Step: 103350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:34,283-Speed 3335.04 samples/sec Loss 4.5940 LearningRate 0.0341 Epoch: 8 Global Step: 103360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:37,417-Speed 3268.43 samples/sec Loss 4.6396 LearningRate 0.0341 Epoch: 8 Global Step: 103370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:40,509-Speed 3313.31 samples/sec Loss 4.7269 LearningRate 0.0341 Epoch: 8 Global Step: 103380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:43,643-Speed 3267.98 samples/sec Loss 4.6621 LearningRate 0.0341 Epoch: 8 Global Step: 103390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:46,724-Speed 3325.60 samples/sec Loss 4.6604 LearningRate 0.0341 Epoch: 8 Global Step: 103400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:39:49,812-Speed 3316.30 samples/sec Loss 4.6590 LearningRate 0.0341 Epoch: 8 Global Step: 103410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:39:52,923-Speed 3292.85 samples/sec Loss 4.6391 LearningRate 0.0341 Epoch: 8 Global Step: 103420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:56,014-Speed 3314.41 samples/sec Loss 4.7823 LearningRate 0.0341 Epoch: 8 Global Step: 103430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:39:59,082-Speed 3338.50 samples/sec Loss 4.5466 LearningRate 0.0341 Epoch: 8 Global Step: 103440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:40:02,145-Speed 3344.59 samples/sec Loss 4.6998 LearningRate 0.0341 Epoch: 8 Global Step: 103450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:40:05,282-Speed 3264.93 samples/sec Loss 4.7091 LearningRate 0.0341 Epoch: 8 Global Step: 103460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:40:08,342-Speed 3348.17 samples/sec Loss 4.6479 LearningRate 0.0340 Epoch: 8 Global Step: 103470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:40:11,402-Speed 3346.73 samples/sec Loss 4.6166 LearningRate 0.0340 Epoch: 8 Global Step: 103480 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-27 10:40:14,553-Speed 3251.25 samples/sec Loss 4.6764 LearningRate 0.0340 Epoch: 8 Global Step: 103490 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-27 10:40:17,721-Speed 3233.07 samples/sec Loss 4.7779 LearningRate 0.0340 Epoch: 8 Global Step: 103500 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-27 10:40:20,837-Speed 3286.84 samples/sec Loss 4.7347 LearningRate 0.0340 Epoch: 8 Global Step: 103510 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-27 10:40:23,916-Speed 3327.41 samples/sec Loss 4.6342 LearningRate 0.0340 Epoch: 8 Global Step: 103520 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-27 10:40:27,002-Speed 3319.06 samples/sec Loss 4.6675 LearningRate 0.0340 Epoch: 8 Global Step: 103530 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-27 10:40:30,210-Speed 3192.84 samples/sec Loss 4.5405 LearningRate 0.0340 Epoch: 8 Global Step: 103540 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-27 10:40:33,311-Speed 3303.93 samples/sec Loss 4.6184 LearningRate 0.0340 Epoch: 8 Global Step: 103550 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-27 10:40:36,390-Speed 3326.24 samples/sec Loss 4.6401 LearningRate 0.0340 Epoch: 8 Global Step: 103560 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-27 10:40:39,525-Speed 3267.37 samples/sec Loss 4.7023 LearningRate 0.0340 Epoch: 8 Global Step: 103570 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-27 10:40:42,689-Speed 3237.56 samples/sec Loss 4.6492 LearningRate 0.0340 Epoch: 8 Global Step: 103580 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:40:45,742-Speed 3355.08 samples/sec Loss 4.5446 LearningRate 0.0340 Epoch: 8 Global Step: 103590 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:40:48,871-Speed 3274.23 samples/sec Loss 4.5393 LearningRate 0.0340 Epoch: 8 Global Step: 103600 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:40:52,070-Speed 3201.62 samples/sec Loss 4.7183 LearningRate 0.0340 Epoch: 8 Global Step: 103610 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:40:55,198-Speed 3274.60 samples/sec Loss 4.5902 LearningRate 0.0340 Epoch: 8 Global Step: 103620 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:40:58,290-Speed 3313.37 samples/sec Loss 4.6535 LearningRate 0.0340 Epoch: 8 Global Step: 103630 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:01,385-Speed 3309.69 samples/sec Loss 4.6611 LearningRate 0.0340 Epoch: 8 Global Step: 103640 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:04,469-Speed 3321.09 samples/sec Loss 4.7352 LearningRate 0.0340 Epoch: 8 Global Step: 103650 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:07,639-Speed 3231.51 samples/sec Loss 4.6954 LearningRate 0.0340 Epoch: 8 Global Step: 103660 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:10,719-Speed 3325.62 samples/sec Loss 4.6521 LearningRate 0.0340 Epoch: 8 Global Step: 103670 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:13,880-Speed 3240.24 samples/sec Loss 4.7244 LearningRate 0.0339 Epoch: 8 Global Step: 103680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:41:16,958-Speed 3328.27 samples/sec Loss 4.6300 LearningRate 0.0339 Epoch: 8 Global Step: 103690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:41:20,103-Speed 3257.06 samples/sec Loss 4.6252 LearningRate 0.0339 Epoch: 8 Global Step: 103700 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:23,232-Speed 3273.95 samples/sec Loss 4.6848 LearningRate 0.0339 Epoch: 8 Global Step: 103710 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:26,355-Speed 3280.20 samples/sec Loss 4.6633 LearningRate 0.0339 Epoch: 8 Global Step: 103720 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:29,472-Speed 3285.77 samples/sec Loss 4.6397 LearningRate 0.0339 Epoch: 8 Global Step: 103730 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:32,583-Speed 3292.36 samples/sec Loss 4.6248 LearningRate 0.0339 Epoch: 8 Global Step: 103740 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:35,686-Speed 3306.89 samples/sec Loss 4.7233 LearningRate 0.0339 Epoch: 8 Global Step: 103750 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:38,848-Speed 3238.79 samples/sec Loss 4.6778 LearningRate 0.0339 Epoch: 8 Global Step: 103760 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:42,006-Speed 3243.66 samples/sec Loss 4.7101 LearningRate 0.0339 Epoch: 8 Global Step: 103770 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:45,814-Speed 2689.73 samples/sec Loss 4.7342 LearningRate 0.0339 Epoch: 8 Global Step: 103780 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:48,959-Speed 3256.91 samples/sec Loss 4.6651 LearningRate 0.0339 Epoch: 8 Global Step: 103790 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:41:52,142-Speed 3218.48 samples/sec Loss 4.7795 LearningRate 0.0339 Epoch: 8 Global Step: 103800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:41:55,251-Speed 3295.04 samples/sec Loss 4.7637 LearningRate 0.0339 Epoch: 8 Global Step: 103810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:41:58,346-Speed 3308.80 samples/sec Loss 4.6943 LearningRate 0.0339 Epoch: 8 Global Step: 103820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:42:01,507-Speed 3240.82 samples/sec Loss 4.6744 LearningRate 0.0339 Epoch: 8 Global Step: 103830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:42:04,582-Speed 3331.00 samples/sec Loss 4.7194 LearningRate 0.0339 Epoch: 8 Global Step: 103840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:42:07,761-Speed 3222.48 samples/sec Loss 4.7296 LearningRate 0.0339 Epoch: 8 Global Step: 103850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:42:10,845-Speed 3321.30 samples/sec Loss 4.6932 LearningRate 0.0339 Epoch: 8 Global Step: 103860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:42:13,941-Speed 3308.48 samples/sec Loss 4.6594 LearningRate 0.0339 Epoch: 8 Global Step: 103870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:42:17,033-Speed 3312.74 samples/sec Loss 4.7419 LearningRate 0.0339 Epoch: 8 Global Step: 103880 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:42:20,095-Speed 3345.09 samples/sec Loss 4.7079 LearningRate 0.0338 Epoch: 8 Global Step: 103890 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:42:23,175-Speed 3326.43 samples/sec Loss 4.6337 LearningRate 0.0338 Epoch: 8 Global Step: 103900 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:42:26,317-Speed 3259.44 samples/sec Loss 4.5966 LearningRate 0.0338 Epoch: 8 Global Step: 103910 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:42:29,457-Speed 3261.86 samples/sec Loss 4.7009 LearningRate 0.0338 Epoch: 8 Global Step: 103920 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:42:32,539-Speed 3323.89 samples/sec Loss 4.6810 LearningRate 0.0338 Epoch: 8 Global Step: 103930 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:42:35,604-Speed 3341.56 samples/sec Loss 4.7269 LearningRate 0.0338 Epoch: 8 Global Step: 103940 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:42:38,680-Speed 3330.64 samples/sec Loss 4.6540 LearningRate 0.0338 Epoch: 8 Global Step: 103950 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:42:41,763-Speed 3322.28 samples/sec Loss 4.7984 LearningRate 0.0338 Epoch: 8 Global Step: 103960 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:42:44,826-Speed 3344.58 samples/sec Loss 4.7024 LearningRate 0.0338 Epoch: 8 Global Step: 103970 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:42:47,960-Speed 3268.09 samples/sec Loss 4.7230 LearningRate 0.0338 Epoch: 8 Global Step: 103980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:42:51,061-Speed 3303.01 samples/sec Loss 4.7333 LearningRate 0.0338 Epoch: 8 Global Step: 103990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:42:54,152-Speed 3313.74 samples/sec Loss 4.7268 LearningRate 0.0338 Epoch: 8 Global Step: 104000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:42:57,260-Speed 3295.97 samples/sec Loss 4.6654 LearningRate 0.0338 Epoch: 8 Global Step: 104010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:00,342-Speed 3322.87 samples/sec Loss 4.7141 LearningRate 0.0338 Epoch: 8 Global Step: 104020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:03,469-Speed 3275.54 samples/sec Loss 4.6752 LearningRate 0.0338 Epoch: 8 Global Step: 104030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:06,592-Speed 3280.18 samples/sec Loss 4.6798 LearningRate 0.0338 Epoch: 8 Global Step: 104040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:09,668-Speed 3329.97 samples/sec Loss 4.7006 LearningRate 0.0338 Epoch: 8 Global Step: 104050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:12,820-Speed 3250.09 samples/sec Loss 4.7749 LearningRate 0.0338 Epoch: 8 Global Step: 104060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:15,910-Speed 3315.27 samples/sec Loss 4.6762 LearningRate 0.0338 Epoch: 8 Global Step: 104070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:19,036-Speed 3276.34 samples/sec Loss 4.7109 LearningRate 0.0338 Epoch: 8 Global Step: 104080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:43:22,108-Speed 3334.28 samples/sec Loss 4.7013 LearningRate 0.0338 Epoch: 8 Global Step: 104090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:43:25,194-Speed 3319.98 samples/sec Loss 4.7201 LearningRate 0.0338 Epoch: 8 Global Step: 104100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:43:28,303-Speed 3294.15 samples/sec Loss 4.7102 LearningRate 0.0337 Epoch: 8 Global Step: 104110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:43:31,390-Speed 3318.43 samples/sec Loss 4.6498 LearningRate 0.0337 Epoch: 8 Global Step: 104120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:43:34,455-Speed 3341.72 samples/sec Loss 4.6545 LearningRate 0.0337 Epoch: 8 Global Step: 104130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:43:37,563-Speed 3295.00 samples/sec Loss 4.6988 LearningRate 0.0337 Epoch: 8 Global Step: 104140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:40,709-Speed 3256.18 samples/sec Loss 4.6342 LearningRate 0.0337 Epoch: 8 Global Step: 104150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:43,812-Speed 3301.45 samples/sec Loss 4.7320 LearningRate 0.0337 Epoch: 8 Global Step: 104160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:46,880-Speed 3338.69 samples/sec Loss 4.6512 LearningRate 0.0337 Epoch: 8 Global Step: 104170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:49,960-Speed 3325.68 samples/sec Loss 4.7622 LearningRate 0.0337 Epoch: 8 Global Step: 104180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:53,038-Speed 3327.59 samples/sec Loss 4.6693 LearningRate 0.0337 Epoch: 8 Global Step: 104190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:56,146-Speed 3296.41 samples/sec Loss 4.6054 LearningRate 0.0337 Epoch: 8 Global Step: 104200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:43:59,234-Speed 3316.83 samples/sec Loss 4.5860 LearningRate 0.0337 Epoch: 8 Global Step: 104210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:44:02,341-Speed 3296.60 samples/sec Loss 4.5930 LearningRate 0.0337 Epoch: 8 Global Step: 104220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:44:05,510-Speed 3232.09 samples/sec Loss 4.7540 LearningRate 0.0337 Epoch: 8 Global Step: 104230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:44:08,619-Speed 3295.21 samples/sec Loss 4.7260 LearningRate 0.0337 Epoch: 8 Global Step: 104240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:44:11,703-Speed 3322.12 samples/sec Loss 4.7150 LearningRate 0.0337 Epoch: 8 Global Step: 104250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:44:14,795-Speed 3312.15 samples/sec Loss 4.7436 LearningRate 0.0337 Epoch: 8 Global Step: 104260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:44:17,894-Speed 3305.98 samples/sec Loss 4.7123 LearningRate 0.0337 Epoch: 8 Global Step: 104270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:44:20,988-Speed 3310.78 samples/sec Loss 4.7423 LearningRate 0.0337 Epoch: 8 Global Step: 104280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:44:24,058-Speed 3336.65 samples/sec Loss 4.6466 LearningRate 0.0337 Epoch: 8 Global Step: 104290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:44:27,182-Speed 3278.67 samples/sec Loss 4.6750 LearningRate 0.0337 Epoch: 8 Global Step: 104300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:44:30,354-Speed 3229.09 samples/sec Loss 4.6964 LearningRate 0.0337 Epoch: 8 Global Step: 104310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:44:33,444-Speed 3315.14 samples/sec Loss 4.7234 LearningRate 0.0336 Epoch: 8 Global Step: 104320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:44:36,615-Speed 3230.33 samples/sec Loss 4.7154 LearningRate 0.0336 Epoch: 8 Global Step: 104330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:44:39,716-Speed 3302.50 samples/sec Loss 4.7099 LearningRate 0.0336 Epoch: 8 Global Step: 104340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:44:42,771-Speed 3352.80 samples/sec Loss 4.7182 LearningRate 0.0336 Epoch: 8 Global Step: 104350 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:44:45,855-Speed 3321.64 samples/sec Loss 4.7380 LearningRate 0.0336 Epoch: 8 Global Step: 104360 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:44:49,013-Speed 3244.22 samples/sec Loss 4.7833 LearningRate 0.0336 Epoch: 8 Global Step: 104370 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:44:52,169-Speed 3244.50 samples/sec Loss 4.7037 LearningRate 0.0336 Epoch: 8 Global Step: 104380 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:44:55,329-Speed 3241.96 samples/sec Loss 4.7299 LearningRate 0.0336 Epoch: 8 Global Step: 104390 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:44:58,392-Speed 3344.28 samples/sec Loss 4.7947 LearningRate 0.0336 Epoch: 8 Global Step: 104400 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:45:01,507-Speed 3288.37 samples/sec Loss 4.8041 LearningRate 0.0336 Epoch: 8 Global Step: 104410 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:45:04,668-Speed 3240.41 samples/sec Loss 4.7458 LearningRate 0.0336 Epoch: 8 Global Step: 104420 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:45:07,860-Speed 3208.67 samples/sec Loss 4.7263 LearningRate 0.0336 Epoch: 8 Global Step: 104430 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:45:10,986-Speed 3277.11 samples/sec Loss 4.7609 LearningRate 0.0336 Epoch: 8 Global Step: 104440 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:45:14,211-Speed 3176.22 samples/sec Loss 4.7268 LearningRate 0.0336 Epoch: 8 Global Step: 104450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:17,386-Speed 3226.80 samples/sec Loss 4.7085 LearningRate 0.0336 Epoch: 8 Global Step: 104460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:20,553-Speed 3233.95 samples/sec Loss 4.7346 LearningRate 0.0336 Epoch: 8 Global Step: 104470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:23,653-Speed 3304.35 samples/sec Loss 4.7237 LearningRate 0.0336 Epoch: 8 Global Step: 104480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:26,787-Speed 3268.31 samples/sec Loss 4.6661 LearningRate 0.0336 Epoch: 8 Global Step: 104490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:29,873-Speed 3319.64 samples/sec Loss 4.7742 LearningRate 0.0336 Epoch: 8 Global Step: 104500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:32,959-Speed 3318.95 samples/sec Loss 4.7061 LearningRate 0.0336 Epoch: 8 Global Step: 104510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:36,164-Speed 3196.33 samples/sec Loss 4.7381 LearningRate 0.0336 Epoch: 8 Global Step: 104520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:39,264-Speed 3303.52 samples/sec Loss 4.7155 LearningRate 0.0335 Epoch: 8 Global Step: 104530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:42,322-Speed 3350.62 samples/sec Loss 4.7431 LearningRate 0.0335 Epoch: 8 Global Step: 104540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:45,384-Speed 3345.06 samples/sec Loss 4.7420 LearningRate 0.0335 Epoch: 8 Global Step: 104550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:48,559-Speed 3225.72 samples/sec Loss 4.7741 LearningRate 0.0335 Epoch: 8 Global Step: 104560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:51,688-Speed 3273.65 samples/sec Loss 4.7533 LearningRate 0.0335 Epoch: 8 Global Step: 104570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:54,789-Speed 3302.92 samples/sec Loss 4.7233 LearningRate 0.0335 Epoch: 8 Global Step: 104580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:45:57,886-Speed 3308.28 samples/sec Loss 4.7895 LearningRate 0.0335 Epoch: 8 Global Step: 104590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:01,002-Speed 3287.66 samples/sec Loss 4.6707 LearningRate 0.0335 Epoch: 8 Global Step: 104600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:04,134-Speed 3269.91 samples/sec Loss 4.6198 LearningRate 0.0335 Epoch: 8 Global Step: 104610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:07,247-Speed 3290.95 samples/sec Loss 4.6690 LearningRate 0.0335 Epoch: 8 Global Step: 104620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:10,337-Speed 3314.25 samples/sec Loss 4.7410 LearningRate 0.0335 Epoch: 8 Global Step: 104630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:13,427-Speed 3315.56 samples/sec Loss 4.7670 LearningRate 0.0335 Epoch: 8 Global Step: 104640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:16,511-Speed 3321.39 samples/sec Loss 4.8185 LearningRate 0.0335 Epoch: 8 Global Step: 104650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:46:19,627-Speed 3287.06 samples/sec Loss 4.7573 LearningRate 0.0335 Epoch: 8 Global Step: 104660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:46:22,730-Speed 3301.03 samples/sec Loss 4.7011 LearningRate 0.0335 Epoch: 8 Global Step: 104670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:46:25,892-Speed 3239.20 samples/sec Loss 4.6925 LearningRate 0.0335 Epoch: 8 Global Step: 104680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:28,991-Speed 3305.52 samples/sec Loss 4.7832 LearningRate 0.0335 Epoch: 8 Global Step: 104690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:32,069-Speed 3327.64 samples/sec Loss 4.7274 LearningRate 0.0335 Epoch: 8 Global Step: 104700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:35,163-Speed 3310.84 samples/sec Loss 4.7418 LearningRate 0.0335 Epoch: 8 Global Step: 104710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:38,246-Speed 3322.53 samples/sec Loss 4.8923 LearningRate 0.0335 Epoch: 8 Global Step: 104720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:41,361-Speed 3287.57 samples/sec Loss 4.7834 LearningRate 0.0335 Epoch: 8 Global Step: 104730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:44,416-Speed 3353.49 samples/sec Loss 4.7430 LearningRate 0.0335 Epoch: 8 Global Step: 104740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:47,506-Speed 3314.71 samples/sec Loss 4.8115 LearningRate 0.0334 Epoch: 8 Global Step: 104750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:50,644-Speed 3265.00 samples/sec Loss 4.7088 LearningRate 0.0334 Epoch: 8 Global Step: 104760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:53,717-Speed 3333.34 samples/sec Loss 4.7315 LearningRate 0.0334 Epoch: 8 Global Step: 104770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:46:56,772-Speed 3352.87 samples/sec Loss 4.8313 LearningRate 0.0334 Epoch: 8 Global Step: 104780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:46:59,856-Speed 3320.83 samples/sec Loss 4.7124 LearningRate 0.0334 Epoch: 8 Global Step: 104790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:47:02,983-Speed 3275.83 samples/sec Loss 4.7103 LearningRate 0.0334 Epoch: 8 Global Step: 104800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:47:06,120-Speed 3265.77 samples/sec Loss 4.6600 LearningRate 0.0334 Epoch: 8 Global Step: 104810 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:47:09,230-Speed 3293.51 samples/sec Loss 4.7048 LearningRate 0.0334 Epoch: 8 Global Step: 104820 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:47:12,293-Speed 3344.50 samples/sec Loss 4.7709 LearningRate 0.0334 Epoch: 8 Global Step: 104830 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:47:15,412-Speed 3284.14 samples/sec Loss 4.7651 LearningRate 0.0334 Epoch: 8 Global Step: 104840 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:47:18,494-Speed 3322.63 samples/sec Loss 4.7940 LearningRate 0.0334 Epoch: 8 Global Step: 104850 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:47:21,621-Speed 3276.01 samples/sec Loss 4.8275 LearningRate 0.0334 Epoch: 8 Global Step: 104860 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:47:24,779-Speed 3243.10 samples/sec Loss 4.6991 LearningRate 0.0334 Epoch: 8 Global Step: 104870 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:47:27,946-Speed 3234.07 samples/sec Loss 4.7090 LearningRate 0.0334 Epoch: 8 Global Step: 104880 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:47:31,145-Speed 3203.00 samples/sec Loss 4.8091 LearningRate 0.0334 Epoch: 8 Global Step: 104890 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:47:34,370-Speed 3175.99 samples/sec Loss 4.7425 LearningRate 0.0334 Epoch: 8 Global Step: 104900 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:47:37,508-Speed 3263.89 samples/sec Loss 4.7574 LearningRate 0.0334 Epoch: 8 Global Step: 104910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:47:40,641-Speed 3269.19 samples/sec Loss 4.8816 LearningRate 0.0334 Epoch: 8 Global Step: 104920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:47:43,711-Speed 3336.76 samples/sec Loss 4.7588 LearningRate 0.0334 Epoch: 8 Global Step: 104930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:47:46,783-Speed 3334.82 samples/sec Loss 4.7807 LearningRate 0.0334 Epoch: 8 Global Step: 104940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:47:49,908-Speed 3276.92 samples/sec Loss 4.7136 LearningRate 0.0334 Epoch: 8 Global Step: 104950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:47:53,046-Speed 3265.23 samples/sec Loss 4.6302 LearningRate 0.0333 Epoch: 8 Global Step: 104960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:47:56,131-Speed 3320.04 samples/sec Loss 4.7783 LearningRate 0.0333 Epoch: 8 Global Step: 104970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:47:59,211-Speed 3325.29 samples/sec Loss 4.6934 LearningRate 0.0333 Epoch: 8 Global Step: 104980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:48:02,342-Speed 3271.66 samples/sec Loss 4.7024 LearningRate 0.0333 Epoch: 8 Global Step: 104990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:48:05,524-Speed 3219.12 samples/sec Loss 4.8311 LearningRate 0.0333 Epoch: 8 Global Step: 105000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:48:08,581-Speed 3350.86 samples/sec Loss 4.7594 LearningRate 0.0333 Epoch: 8 Global Step: 105010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:48:11,674-Speed 3312.06 samples/sec Loss 4.8230 LearningRate 0.0333 Epoch: 8 Global Step: 105020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:48:14,765-Speed 3313.68 samples/sec Loss 4.6642 LearningRate 0.0333 Epoch: 8 Global Step: 105030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:48:17,885-Speed 3282.66 samples/sec Loss 4.6767 LearningRate 0.0333 Epoch: 8 Global Step: 105040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:48:21,027-Speed 3260.32 samples/sec Loss 4.7872 LearningRate 0.0333 Epoch: 8 Global Step: 105050 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:48:24,102-Speed 3330.71 samples/sec Loss 4.6630 LearningRate 0.0333 Epoch: 8 Global Step: 105060 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:48:27,230-Speed 3274.52 samples/sec Loss 4.7033 LearningRate 0.0333 Epoch: 8 Global Step: 105070 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:48:30,390-Speed 3241.93 samples/sec Loss 4.8720 LearningRate 0.0333 Epoch: 8 Global Step: 105080 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:48:33,475-Speed 3319.87 samples/sec Loss 4.7758 LearningRate 0.0333 Epoch: 8 Global Step: 105090 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:48:36,638-Speed 3239.28 samples/sec Loss 4.6204 LearningRate 0.0333 Epoch: 8 Global Step: 105100 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:48:39,789-Speed 3250.83 samples/sec Loss 4.7878 LearningRate 0.0333 Epoch: 8 Global Step: 105110 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:48:42,857-Speed 3338.13 samples/sec Loss 4.8174 LearningRate 0.0333 Epoch: 8 Global Step: 105120 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:48:45,914-Speed 3351.69 samples/sec Loss 4.7610 LearningRate 0.0333 Epoch: 8 Global Step: 105130 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:48:49,104-Speed 3210.62 samples/sec Loss 4.8452 LearningRate 0.0333 Epoch: 8 Global Step: 105140 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:48:52,224-Speed 3283.72 samples/sec Loss 4.7876 LearningRate 0.0333 Epoch: 8 Global Step: 105150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:48:55,282-Speed 3349.44 samples/sec Loss 4.6869 LearningRate 0.0333 Epoch: 8 Global Step: 105160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:48:58,331-Speed 3359.05 samples/sec Loss 4.8600 LearningRate 0.0333 Epoch: 8 Global Step: 105170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:49:01,413-Speed 3323.74 samples/sec Loss 4.8673 LearningRate 0.0332 Epoch: 8 Global Step: 105180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:49:04,561-Speed 3254.34 samples/sec Loss 4.6991 LearningRate 0.0332 Epoch: 8 Global Step: 105190 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:49:07,624-Speed 3344.23 samples/sec Loss 4.7507 LearningRate 0.0332 Epoch: 8 Global Step: 105200 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:49:10,743-Speed 3283.90 samples/sec Loss 4.7860 LearningRate 0.0332 Epoch: 8 Global Step: 105210 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:49:13,834-Speed 3313.61 samples/sec Loss 4.7407 LearningRate 0.0332 Epoch: 8 Global Step: 105220 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:49:16,952-Speed 3285.78 samples/sec Loss 4.6603 LearningRate 0.0332 Epoch: 8 Global Step: 105230 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:49:20,036-Speed 3320.91 samples/sec Loss 4.8616 LearningRate 0.0332 Epoch: 8 Global Step: 105240 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:49:23,113-Speed 3329.20 samples/sec Loss 4.8392 LearningRate 0.0332 Epoch: 8 Global Step: 105250 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:49:26,167-Speed 3353.89 samples/sec Loss 4.7092 LearningRate 0.0332 Epoch: 8 Global Step: 105260 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:49:29,221-Speed 3354.42 samples/sec Loss 4.7253 LearningRate 0.0332 Epoch: 8 Global Step: 105270 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:49:32,299-Speed 3327.19 samples/sec Loss 4.7709 LearningRate 0.0332 Epoch: 8 Global Step: 105280 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:49:35,352-Speed 3356.13 samples/sec Loss 4.7435 LearningRate 0.0332 Epoch: 8 Global Step: 105290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:49:38,424-Speed 3334.53 samples/sec Loss 4.7190 LearningRate 0.0332 Epoch: 8 Global Step: 105300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:49:41,531-Speed 3295.94 samples/sec Loss 4.8083 LearningRate 0.0332 Epoch: 8 Global Step: 105310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:49:44,617-Speed 3319.65 samples/sec Loss 4.7579 LearningRate 0.0332 Epoch: 8 Global Step: 105320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:49:47,737-Speed 3283.60 samples/sec Loss 4.8252 LearningRate 0.0332 Epoch: 8 Global Step: 105330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:49:50,832-Speed 3309.03 samples/sec Loss 4.8090 LearningRate 0.0332 Epoch: 8 Global Step: 105340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:49:53,904-Speed 3334.44 samples/sec Loss 4.7178 LearningRate 0.0332 Epoch: 8 Global Step: 105350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:49:56,981-Speed 3329.27 samples/sec Loss 4.7653 LearningRate 0.0332 Epoch: 8 Global Step: 105360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:50:00,038-Speed 3351.21 samples/sec Loss 4.7773 LearningRate 0.0332 Epoch: 8 Global Step: 105370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:50:03,117-Speed 3326.36 samples/sec Loss 4.7413 LearningRate 0.0332 Epoch: 8 Global Step: 105380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:50:06,225-Speed 3295.50 samples/sec Loss 4.8454 LearningRate 0.0331 Epoch: 8 Global Step: 105390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:50:09,268-Speed 3366.56 samples/sec Loss 4.7611 LearningRate 0.0331 Epoch: 8 Global Step: 105400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:50:12,481-Speed 3187.26 samples/sec Loss 4.8121 LearningRate 0.0331 Epoch: 8 Global Step: 105410 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:50:15,666-Speed 3216.28 samples/sec Loss 4.7008 LearningRate 0.0331 Epoch: 8 Global Step: 105420 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:50:18,911-Speed 3156.88 samples/sec Loss 4.8150 LearningRate 0.0331 Epoch: 8 Global Step: 105430 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:50:21,964-Speed 3354.76 samples/sec Loss 4.7284 LearningRate 0.0331 Epoch: 8 Global Step: 105440 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:50:25,054-Speed 3315.79 samples/sec Loss 4.8403 LearningRate 0.0331 Epoch: 8 Global Step: 105450 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:50:28,141-Speed 3317.66 samples/sec Loss 4.7737 LearningRate 0.0331 Epoch: 8 Global Step: 105460 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:50:31,260-Speed 3284.47 samples/sec Loss 4.8006 LearningRate 0.0331 Epoch: 8 Global Step: 105470 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:50:34,315-Speed 3352.56 samples/sec Loss 4.8910 LearningRate 0.0331 Epoch: 8 Global Step: 105480 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:50:37,403-Speed 3316.92 samples/sec Loss 4.7654 LearningRate 0.0331 Epoch: 8 Global Step: 105490 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:50:40,495-Speed 3313.36 samples/sec Loss 4.7760 LearningRate 0.0331 Epoch: 8 Global Step: 105500 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:50:43,545-Speed 3358.72 samples/sec Loss 4.7644 LearningRate 0.0331 Epoch: 8 Global Step: 105510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:50:46,683-Speed 3263.50 samples/sec Loss 4.7437 LearningRate 0.0331 Epoch: 8 Global Step: 105520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:50:49,854-Speed 3230.28 samples/sec Loss 4.7202 LearningRate 0.0331 Epoch: 8 Global Step: 105530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:50:52,973-Speed 3283.97 samples/sec Loss 4.7140 LearningRate 0.0331 Epoch: 8 Global Step: 105540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:50:56,148-Speed 3226.71 samples/sec Loss 4.7077 LearningRate 0.0331 Epoch: 8 Global Step: 105550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:50:59,200-Speed 3356.06 samples/sec Loss 4.7843 LearningRate 0.0331 Epoch: 8 Global Step: 105560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:02,333-Speed 3268.87 samples/sec Loss 4.8938 LearningRate 0.0331 Epoch: 8 Global Step: 105570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:05,460-Speed 3276.55 samples/sec Loss 4.7814 LearningRate 0.0331 Epoch: 8 Global Step: 105580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:08,582-Speed 3280.25 samples/sec Loss 4.7207 LearningRate 0.0331 Epoch: 8 Global Step: 105590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:11,685-Speed 3300.84 samples/sec Loss 4.8849 LearningRate 0.0331 Epoch: 8 Global Step: 105600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:14,770-Speed 3320.94 samples/sec Loss 4.8406 LearningRate 0.0330 Epoch: 8 Global Step: 105610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:51:17,838-Speed 3339.09 samples/sec Loss 4.7168 LearningRate 0.0330 Epoch: 8 Global Step: 105620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:51:20,889-Speed 3356.47 samples/sec Loss 4.7588 LearningRate 0.0330 Epoch: 8 Global Step: 105630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:24,006-Speed 3286.29 samples/sec Loss 4.7737 LearningRate 0.0330 Epoch: 8 Global Step: 105640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:27,209-Speed 3198.39 samples/sec Loss 4.8687 LearningRate 0.0330 Epoch: 8 Global Step: 105650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:30,324-Speed 3288.89 samples/sec Loss 4.8462 LearningRate 0.0330 Epoch: 8 Global Step: 105660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:33,391-Speed 3338.97 samples/sec Loss 4.7764 LearningRate 0.0330 Epoch: 8 Global Step: 105670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:36,511-Speed 3282.78 samples/sec Loss 4.8479 LearningRate 0.0330 Epoch: 8 Global Step: 105680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:39,597-Speed 3319.56 samples/sec Loss 4.8630 LearningRate 0.0330 Epoch: 8 Global Step: 105690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:42,718-Speed 3282.53 samples/sec Loss 4.8491 LearningRate 0.0330 Epoch: 8 Global Step: 105700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:45,782-Speed 3343.09 samples/sec Loss 4.7066 LearningRate 0.0330 Epoch: 8 Global Step: 105710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:48,869-Speed 3317.78 samples/sec Loss 4.7783 LearningRate 0.0330 Epoch: 8 Global Step: 105720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:51,924-Speed 3354.23 samples/sec Loss 4.8782 LearningRate 0.0330 Epoch: 8 Global Step: 105730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:51:54,989-Speed 3341.43 samples/sec Loss 4.8698 LearningRate 0.0330 Epoch: 8 Global Step: 105740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:51:58,049-Speed 3348.30 samples/sec Loss 4.9494 LearningRate 0.0330 Epoch: 8 Global Step: 105750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:52:01,130-Speed 3324.29 samples/sec Loss 4.7985 LearningRate 0.0330 Epoch: 8 Global Step: 105760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:52:04,204-Speed 3331.74 samples/sec Loss 4.8049 LearningRate 0.0330 Epoch: 8 Global Step: 105770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:52:07,258-Speed 3354.49 samples/sec Loss 4.7977 LearningRate 0.0330 Epoch: 8 Global Step: 105780 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:10,340-Speed 3323.47 samples/sec Loss 4.8246 LearningRate 0.0330 Epoch: 8 Global Step: 105790 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:13,490-Speed 3251.95 samples/sec Loss 4.8126 LearningRate 0.0330 Epoch: 8 Global Step: 105800 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:16,659-Speed 3232.71 samples/sec Loss 4.7642 LearningRate 0.0330 Epoch: 8 Global Step: 105810 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:19,716-Speed 3350.05 samples/sec Loss 4.7742 LearningRate 0.0330 Epoch: 8 Global Step: 105820 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:22,782-Speed 3340.81 samples/sec Loss 4.8657 LearningRate 0.0329 Epoch: 8 Global Step: 105830 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:25,908-Speed 3277.32 samples/sec Loss 4.8771 LearningRate 0.0329 Epoch: 8 Global Step: 105840 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:29,071-Speed 3238.72 samples/sec Loss 4.8456 LearningRate 0.0329 Epoch: 8 Global Step: 105850 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:32,170-Speed 3304.91 samples/sec Loss 4.8459 LearningRate 0.0329 Epoch: 8 Global Step: 105860 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:35,241-Speed 3336.03 samples/sec Loss 4.8931 LearningRate 0.0329 Epoch: 8 Global Step: 105870 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:38,328-Speed 3317.14 samples/sec Loss 4.9690 LearningRate 0.0329 Epoch: 8 Global Step: 105880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:52:41,406-Speed 3328.83 samples/sec Loss 4.8932 LearningRate 0.0329 Epoch: 8 Global Step: 105890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:52:44,468-Speed 3344.93 samples/sec Loss 4.8377 LearningRate 0.0329 Epoch: 8 Global Step: 105900 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:47,533-Speed 3342.76 samples/sec Loss 4.9214 LearningRate 0.0329 Epoch: 8 Global Step: 105910 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:50,595-Speed 3344.83 samples/sec Loss 4.7995 LearningRate 0.0329 Epoch: 8 Global Step: 105920 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:53,671-Speed 3330.39 samples/sec Loss 4.8949 LearningRate 0.0329 Epoch: 8 Global Step: 105930 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:56,737-Speed 3340.82 samples/sec Loss 4.7976 LearningRate 0.0329 Epoch: 8 Global Step: 105940 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:52:59,814-Speed 3329.28 samples/sec Loss 4.7893 LearningRate 0.0329 Epoch: 8 Global Step: 105950 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:53:02,928-Speed 3288.77 samples/sec Loss 4.8005 LearningRate 0.0329 Epoch: 8 Global Step: 105960 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:53:06,013-Speed 3320.00 samples/sec Loss 4.8835 LearningRate 0.0329 Epoch: 8 Global Step: 105970 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:53:09,090-Speed 3329.27 samples/sec Loss 4.8095 LearningRate 0.0329 Epoch: 8 Global Step: 105980 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:53:12,172-Speed 3323.93 samples/sec Loss 4.8425 LearningRate 0.0329 Epoch: 8 Global Step: 105990 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:53:15,305-Speed 3269.17 samples/sec Loss 4.7856 LearningRate 0.0329 Epoch: 8 Global Step: 106000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:18,417-Speed 3291.24 samples/sec Loss 4.7714 LearningRate 0.0329 Epoch: 8 Global Step: 106010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:21,486-Speed 3338.10 samples/sec Loss 4.7656 LearningRate 0.0329 Epoch: 8 Global Step: 106020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:24,631-Speed 3256.86 samples/sec Loss 4.7583 LearningRate 0.0329 Epoch: 8 Global Step: 106030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:27,703-Speed 3334.07 samples/sec Loss 4.7427 LearningRate 0.0328 Epoch: 8 Global Step: 106040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:30,830-Speed 3275.91 samples/sec Loss 4.8180 LearningRate 0.0328 Epoch: 8 Global Step: 106050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:33,967-Speed 3265.17 samples/sec Loss 4.7648 LearningRate 0.0328 Epoch: 8 Global Step: 106060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:37,080-Speed 3290.55 samples/sec Loss 4.8795 LearningRate 0.0328 Epoch: 8 Global Step: 106070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:40,245-Speed 3236.84 samples/sec Loss 4.9245 LearningRate 0.0328 Epoch: 8 Global Step: 106080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:43,369-Speed 3278.72 samples/sec Loss 4.9360 LearningRate 0.0328 Epoch: 8 Global Step: 106090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:46,432-Speed 3344.92 samples/sec Loss 4.8000 LearningRate 0.0328 Epoch: 8 Global Step: 106100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:53:49,548-Speed 3286.70 samples/sec Loss 4.7651 LearningRate 0.0328 Epoch: 8 Global Step: 106110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:52,758-Speed 3190.62 samples/sec Loss 4.8605 LearningRate 0.0328 Epoch: 8 Global Step: 106120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:55,863-Speed 3299.24 samples/sec Loss 4.7843 LearningRate 0.0328 Epoch: 8 Global Step: 106130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:53:58,981-Speed 3285.16 samples/sec Loss 4.8086 LearningRate 0.0328 Epoch: 8 Global Step: 106140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:02,082-Speed 3303.34 samples/sec Loss 4.8501 LearningRate 0.0328 Epoch: 8 Global Step: 106150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:05,154-Speed 3335.43 samples/sec Loss 4.7818 LearningRate 0.0328 Epoch: 8 Global Step: 106160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:08,204-Speed 3357.79 samples/sec Loss 4.9124 LearningRate 0.0328 Epoch: 8 Global Step: 106170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:11,302-Speed 3306.03 samples/sec Loss 4.9020 LearningRate 0.0328 Epoch: 8 Global Step: 106180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:14,397-Speed 3310.35 samples/sec Loss 4.8806 LearningRate 0.0328 Epoch: 8 Global Step: 106190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:17,524-Speed 3275.39 samples/sec Loss 4.8565 LearningRate 0.0328 Epoch: 8 Global Step: 106200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:20,607-Speed 3322.77 samples/sec Loss 4.6871 LearningRate 0.0328 Epoch: 8 Global Step: 106210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:54:23,700-Speed 3312.11 samples/sec Loss 4.8540 LearningRate 0.0328 Epoch: 8 Global Step: 106220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:54:26,798-Speed 3305.54 samples/sec Loss 4.8057 LearningRate 0.0328 Epoch: 8 Global Step: 106230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:29,876-Speed 3327.76 samples/sec Loss 4.8091 LearningRate 0.0328 Epoch: 8 Global Step: 106240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:32,970-Speed 3311.12 samples/sec Loss 4.8888 LearningRate 0.0328 Epoch: 8 Global Step: 106250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:36,090-Speed 3283.83 samples/sec Loss 4.8282 LearningRate 0.0327 Epoch: 8 Global Step: 106260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:39,163-Speed 3332.65 samples/sec Loss 4.7770 LearningRate 0.0327 Epoch: 8 Global Step: 106270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:42,349-Speed 3215.03 samples/sec Loss 4.7948 LearningRate 0.0327 Epoch: 8 Global Step: 106280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:45,431-Speed 3324.39 samples/sec Loss 4.8189 LearningRate 0.0327 Epoch: 8 Global Step: 106290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:54:48,588-Speed 3243.74 samples/sec Loss 4.6973 LearningRate 0.0327 Epoch: 8 Global Step: 106300 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:54:51,694-Speed 3298.51 samples/sec Loss 4.7824 LearningRate 0.0327 Epoch: 8 Global Step: 106310 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:54:54,827-Speed 3268.94 samples/sec Loss 4.9043 LearningRate 0.0327 Epoch: 8 Global Step: 106320 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:54:57,902-Speed 3331.51 samples/sec Loss 4.8665 LearningRate 0.0327 Epoch: 8 Global Step: 106330 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:55:00,954-Speed 3357.60 samples/sec Loss 4.7920 LearningRate 0.0327 Epoch: 8 Global Step: 106340 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:55:04,091-Speed 3264.50 samples/sec Loss 4.8696 LearningRate 0.0327 Epoch: 8 Global Step: 106350 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:55:07,252-Speed 3240.53 samples/sec Loss 4.8448 LearningRate 0.0327 Epoch: 8 Global Step: 106360 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:55:10,317-Speed 3342.52 samples/sec Loss 4.7975 LearningRate 0.0327 Epoch: 8 Global Step: 106370 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:55:13,391-Speed 3332.13 samples/sec Loss 4.7741 LearningRate 0.0327 Epoch: 8 Global Step: 106380 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:55:16,511-Speed 3283.10 samples/sec Loss 4.7806 LearningRate 0.0327 Epoch: 8 Global Step: 106390 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:55:19,655-Speed 3258.10 samples/sec Loss 4.7931 LearningRate 0.0327 Epoch: 8 Global Step: 106400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:22,726-Speed 3335.60 samples/sec Loss 4.8262 LearningRate 0.0327 Epoch: 8 Global Step: 106410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:25,796-Speed 3336.97 samples/sec Loss 4.8325 LearningRate 0.0327 Epoch: 8 Global Step: 106420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:28,912-Speed 3287.45 samples/sec Loss 4.8299 LearningRate 0.0327 Epoch: 8 Global Step: 106430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:32,030-Speed 3284.28 samples/sec Loss 4.7746 LearningRate 0.0327 Epoch: 8 Global Step: 106440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:35,150-Speed 3283.54 samples/sec Loss 4.8029 LearningRate 0.0327 Epoch: 8 Global Step: 106450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:38,252-Speed 3301.90 samples/sec Loss 4.8406 LearningRate 0.0327 Epoch: 8 Global Step: 106460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:41,412-Speed 3242.05 samples/sec Loss 4.7305 LearningRate 0.0327 Epoch: 8 Global Step: 106470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:44,466-Speed 3353.67 samples/sec Loss 4.7035 LearningRate 0.0326 Epoch: 8 Global Step: 106480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:47,577-Speed 3292.42 samples/sec Loss 4.7789 LearningRate 0.0326 Epoch: 8 Global Step: 106490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:50,634-Speed 3350.72 samples/sec Loss 4.8198 LearningRate 0.0326 Epoch: 8 Global Step: 106500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:53,749-Speed 3288.78 samples/sec Loss 4.8841 LearningRate 0.0326 Epoch: 8 Global Step: 106510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:56,847-Speed 3306.66 samples/sec Loss 4.9072 LearningRate 0.0326 Epoch: 8 Global Step: 106520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:55:59,980-Speed 3269.02 samples/sec Loss 4.9244 LearningRate 0.0326 Epoch: 8 Global Step: 106530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:56:03,145-Speed 3236.41 samples/sec Loss 4.7711 LearningRate 0.0326 Epoch: 8 Global Step: 106540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:56:06,315-Speed 3231.25 samples/sec Loss 4.7675 LearningRate 0.0326 Epoch: 8 Global Step: 106550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:56:09,426-Speed 3292.88 samples/sec Loss 4.8622 LearningRate 0.0326 Epoch: 8 Global Step: 106560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:56:12,506-Speed 3325.20 samples/sec Loss 4.8474 LearningRate 0.0326 Epoch: 8 Global Step: 106570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:56:15,605-Speed 3305.69 samples/sec Loss 4.9255 LearningRate 0.0326 Epoch: 8 Global Step: 106580 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:56:18,699-Speed 3310.66 samples/sec Loss 4.7894 LearningRate 0.0326 Epoch: 8 Global Step: 106590 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:56:21,758-Speed 3348.36 samples/sec Loss 4.7611 LearningRate 0.0326 Epoch: 8 Global Step: 106600 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:56:24,872-Speed 3289.52 samples/sec Loss 4.9594 LearningRate 0.0326 Epoch: 8 Global Step: 106610 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:56:27,974-Speed 3301.46 samples/sec Loss 4.9121 LearningRate 0.0326 Epoch: 8 Global Step: 106620 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:56:31,123-Speed 3253.17 samples/sec Loss 4.8450 LearningRate 0.0326 Epoch: 8 Global Step: 106630 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:56:34,203-Speed 3325.81 samples/sec Loss 4.8468 LearningRate 0.0326 Epoch: 8 Global Step: 106640 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:56:37,322-Speed 3284.14 samples/sec Loss 4.6997 LearningRate 0.0326 Epoch: 8 Global Step: 106650 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:56:40,383-Speed 3346.06 samples/sec Loss 4.8231 LearningRate 0.0326 Epoch: 8 Global Step: 106660 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:56:43,480-Speed 3308.09 samples/sec Loss 4.7079 LearningRate 0.0326 Epoch: 8 Global Step: 106670 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:56:46,560-Speed 3325.66 samples/sec Loss 4.8262 LearningRate 0.0326 Epoch: 8 Global Step: 106680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:56:49,653-Speed 3311.63 samples/sec Loss 4.8390 LearningRate 0.0325 Epoch: 8 Global Step: 106690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:56:52,760-Speed 3296.48 samples/sec Loss 4.8330 LearningRate 0.0325 Epoch: 8 Global Step: 106700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:56:55,848-Speed 3317.36 samples/sec Loss 4.8898 LearningRate 0.0325 Epoch: 8 Global Step: 106710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:56:58,943-Speed 3309.46 samples/sec Loss 4.8143 LearningRate 0.0325 Epoch: 8 Global Step: 106720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:02,060-Speed 3286.52 samples/sec Loss 4.7698 LearningRate 0.0325 Epoch: 8 Global Step: 106730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:05,145-Speed 3320.71 samples/sec Loss 4.7654 LearningRate 0.0325 Epoch: 8 Global Step: 106740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:08,291-Speed 3255.84 samples/sec Loss 4.7617 LearningRate 0.0325 Epoch: 8 Global Step: 106750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:11,402-Speed 3292.39 samples/sec Loss 4.7814 LearningRate 0.0325 Epoch: 8 Global Step: 106760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:14,508-Speed 3297.84 samples/sec Loss 4.8154 LearningRate 0.0325 Epoch: 8 Global Step: 106770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:17,660-Speed 3250.17 samples/sec Loss 4.8252 LearningRate 0.0325 Epoch: 8 Global Step: 106780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:57:20,731-Speed 3335.41 samples/sec Loss 4.8761 LearningRate 0.0325 Epoch: 8 Global Step: 106790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:57:23,820-Speed 3316.22 samples/sec Loss 4.8446 LearningRate 0.0325 Epoch: 8 Global Step: 106800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 10:57:26,947-Speed 3275.43 samples/sec Loss 4.8290 LearningRate 0.0325 Epoch: 8 Global Step: 106810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:30,137-Speed 3211.25 samples/sec Loss 4.8296 LearningRate 0.0325 Epoch: 8 Global Step: 106820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:33,219-Speed 3323.63 samples/sec Loss 4.8747 LearningRate 0.0325 Epoch: 8 Global Step: 106830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:36,364-Speed 3257.42 samples/sec Loss 4.8401 LearningRate 0.0325 Epoch: 8 Global Step: 106840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:39,532-Speed 3233.57 samples/sec Loss 4.9261 LearningRate 0.0325 Epoch: 8 Global Step: 106850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:42,615-Speed 3322.11 samples/sec Loss 4.7870 LearningRate 0.0325 Epoch: 8 Global Step: 106860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:45,691-Speed 3329.59 samples/sec Loss 4.8261 LearningRate 0.0325 Epoch: 8 Global Step: 106870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:48,775-Speed 3321.78 samples/sec Loss 4.7404 LearningRate 0.0325 Epoch: 8 Global Step: 106880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:51,900-Speed 3277.12 samples/sec Loss 4.7702 LearningRate 0.0325 Epoch: 8 Global Step: 106890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:55,065-Speed 3237.50 samples/sec Loss 4.9020 LearningRate 0.0325 Epoch: 8 Global Step: 106900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:57:58,112-Speed 3361.70 samples/sec Loss 4.7854 LearningRate 0.0324 Epoch: 8 Global Step: 106910 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:58:01,212-Speed 3304.21 samples/sec Loss 4.8262 LearningRate 0.0324 Epoch: 8 Global Step: 106920 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:58:04,315-Speed 3300.08 samples/sec Loss 4.8014 LearningRate 0.0324 Epoch: 8 Global Step: 106930 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:58:07,447-Speed 3271.03 samples/sec Loss 4.7938 LearningRate 0.0324 Epoch: 8 Global Step: 106940 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:58:10,536-Speed 3315.82 samples/sec Loss 4.7565 LearningRate 0.0324 Epoch: 8 Global Step: 106950 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:58:13,623-Speed 3318.75 samples/sec Loss 4.8191 LearningRate 0.0324 Epoch: 8 Global Step: 106960 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:58:16,751-Speed 3274.24 samples/sec Loss 4.7797 LearningRate 0.0324 Epoch: 8 Global Step: 106970 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:58:19,837-Speed 3319.41 samples/sec Loss 4.8657 LearningRate 0.0324 Epoch: 8 Global Step: 106980 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:58:22,964-Speed 3276.20 samples/sec Loss 4.8462 LearningRate 0.0324 Epoch: 8 Global Step: 106990 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:58:26,116-Speed 3249.27 samples/sec Loss 4.7261 LearningRate 0.0324 Epoch: 8 Global Step: 107000 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:58:29,219-Speed 3301.08 samples/sec Loss 4.9493 LearningRate 0.0324 Epoch: 8 Global Step: 107010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:58:32,323-Speed 3300.14 samples/sec Loss 4.7657 LearningRate 0.0324 Epoch: 8 Global Step: 107020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:58:35,419-Speed 3307.97 samples/sec Loss 4.8538 LearningRate 0.0324 Epoch: 8 Global Step: 107030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:58:38,506-Speed 3318.80 samples/sec Loss 4.7991 LearningRate 0.0324 Epoch: 8 Global Step: 107040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:58:41,658-Speed 3249.24 samples/sec Loss 4.6994 LearningRate 0.0324 Epoch: 8 Global Step: 107050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:58:44,756-Speed 3306.90 samples/sec Loss 4.7720 LearningRate 0.0324 Epoch: 8 Global Step: 107060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:58:47,899-Speed 3258.77 samples/sec Loss 4.8552 LearningRate 0.0324 Epoch: 8 Global Step: 107070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:58:50,988-Speed 3315.70 samples/sec Loss 4.9093 LearningRate 0.0324 Epoch: 8 Global Step: 107080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:58:54,091-Speed 3300.44 samples/sec Loss 4.8095 LearningRate 0.0324 Epoch: 8 Global Step: 107090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:58:57,171-Speed 3326.41 samples/sec Loss 4.8527 LearningRate 0.0324 Epoch: 8 Global Step: 107100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:59:00,277-Speed 3297.94 samples/sec Loss 4.8346 LearningRate 0.0324 Epoch: 8 Global Step: 107110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:59:03,351-Speed 3332.36 samples/sec Loss 4.8781 LearningRate 0.0324 Epoch: 8 Global Step: 107120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:59:06,409-Speed 3350.80 samples/sec Loss 4.8335 LearningRate 0.0323 Epoch: 8 Global Step: 107130 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:59:09,463-Speed 3353.94 samples/sec Loss 4.8290 LearningRate 0.0323 Epoch: 8 Global Step: 107140 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:59:12,528-Speed 3342.45 samples/sec Loss 4.8763 LearningRate 0.0323 Epoch: 8 Global Step: 107150 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:59:15,676-Speed 3253.14 samples/sec Loss 4.8631 LearningRate 0.0323 Epoch: 8 Global Step: 107160 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:59:18,750-Speed 3333.23 samples/sec Loss 4.8997 LearningRate 0.0323 Epoch: 8 Global Step: 107170 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:59:21,810-Speed 3347.36 samples/sec Loss 4.9055 LearningRate 0.0323 Epoch: 8 Global Step: 107180 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:59:24,944-Speed 3267.86 samples/sec Loss 4.8684 LearningRate 0.0323 Epoch: 8 Global Step: 107190 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:59:28,014-Speed 3336.92 samples/sec Loss 4.6725 LearningRate 0.0323 Epoch: 8 Global Step: 107200 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:59:31,102-Speed 3317.25 samples/sec Loss 4.8659 LearningRate 0.0323 Epoch: 8 Global Step: 107210 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:59:34,270-Speed 3233.51 samples/sec Loss 4.8271 LearningRate 0.0323 Epoch: 8 Global Step: 107220 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 10:59:37,374-Speed 3300.41 samples/sec Loss 4.8638 LearningRate 0.0323 Epoch: 8 Global Step: 107230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:59:40,464-Speed 3314.65 samples/sec Loss 4.8902 LearningRate 0.0323 Epoch: 8 Global Step: 107240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:59:43,536-Speed 3333.92 samples/sec Loss 4.8445 LearningRate 0.0323 Epoch: 8 Global Step: 107250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:59:46,615-Speed 3327.30 samples/sec Loss 4.9319 LearningRate 0.0323 Epoch: 8 Global Step: 107260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:59:49,757-Speed 3259.98 samples/sec Loss 4.8343 LearningRate 0.0323 Epoch: 8 Global Step: 107270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:59:52,937-Speed 3220.78 samples/sec Loss 4.7934 LearningRate 0.0323 Epoch: 8 Global Step: 107280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:59:56,069-Speed 3270.67 samples/sec Loss 4.8419 LearningRate 0.0323 Epoch: 8 Global Step: 107290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 10:59:59,150-Speed 3324.87 samples/sec Loss 4.8235 LearningRate 0.0323 Epoch: 8 Global Step: 107300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:00:02,257-Speed 3297.11 samples/sec Loss 4.8376 LearningRate 0.0323 Epoch: 8 Global Step: 107310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:00:05,372-Speed 3287.86 samples/sec Loss 4.9231 LearningRate 0.0323 Epoch: 8 Global Step: 107320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:00:08,446-Speed 3331.71 samples/sec Loss 4.9120 LearningRate 0.0323 Epoch: 8 Global Step: 107330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:00:11,518-Speed 3335.60 samples/sec Loss 4.8804 LearningRate 0.0323 Epoch: 8 Global Step: 107340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:00:14,613-Speed 3309.81 samples/sec Loss 4.8261 LearningRate 0.0322 Epoch: 8 Global Step: 107350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:00:17,733-Speed 3282.43 samples/sec Loss 4.8864 LearningRate 0.0322 Epoch: 8 Global Step: 107360 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:00:20,791-Speed 3350.02 samples/sec Loss 4.8649 LearningRate 0.0322 Epoch: 8 Global Step: 107370 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:00:23,867-Speed 3329.64 samples/sec Loss 4.9316 LearningRate 0.0322 Epoch: 8 Global Step: 107380 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:00:26,954-Speed 3319.21 samples/sec Loss 4.8380 LearningRate 0.0322 Epoch: 8 Global Step: 107390 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:00:30,074-Speed 3282.88 samples/sec Loss 4.9101 LearningRate 0.0322 Epoch: 8 Global Step: 107400 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:00:33,150-Speed 3329.54 samples/sec Loss 4.9034 LearningRate 0.0322 Epoch: 8 Global Step: 107410 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:00:36,230-Speed 3326.26 samples/sec Loss 4.9464 LearningRate 0.0322 Epoch: 8 Global Step: 107420 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:00:39,346-Speed 3287.27 samples/sec Loss 4.7782 LearningRate 0.0322 Epoch: 8 Global Step: 107430 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:00:42,464-Speed 3284.31 samples/sec Loss 4.8956 LearningRate 0.0322 Epoch: 8 Global Step: 107440 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:00:45,550-Speed 3319.86 samples/sec Loss 4.8377 LearningRate 0.0322 Epoch: 8 Global Step: 107450 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:00:48,695-Speed 3256.39 samples/sec Loss 4.8332 LearningRate 0.0322 Epoch: 8 Global Step: 107460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:00:51,813-Speed 3285.50 samples/sec Loss 4.9071 LearningRate 0.0322 Epoch: 8 Global Step: 107470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:00:54,908-Speed 3309.35 samples/sec Loss 4.8197 LearningRate 0.0322 Epoch: 8 Global Step: 107480 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:00:57,996-Speed 3317.16 samples/sec Loss 4.9211 LearningRate 0.0322 Epoch: 8 Global Step: 107490 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:01:01,120-Speed 3278.57 samples/sec Loss 4.8157 LearningRate 0.0322 Epoch: 8 Global Step: 107500 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:01:04,211-Speed 3313.98 samples/sec Loss 4.8115 LearningRate 0.0322 Epoch: 8 Global Step: 107510 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:01:07,302-Speed 3314.62 samples/sec Loss 4.8730 LearningRate 0.0322 Epoch: 8 Global Step: 107520 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:01:10,375-Speed 3333.04 samples/sec Loss 4.8576 LearningRate 0.0322 Epoch: 8 Global Step: 107530 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:01:13,491-Speed 3287.03 samples/sec Loss 4.8859 LearningRate 0.0322 Epoch: 8 Global Step: 107540 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:01:16,568-Speed 3329.91 samples/sec Loss 4.8496 LearningRate 0.0322 Epoch: 8 Global Step: 107550 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:01:19,744-Speed 3224.24 samples/sec Loss 4.8569 LearningRate 0.0322 Epoch: 8 Global Step: 107560 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:01:22,824-Speed 3326.49 samples/sec Loss 4.8432 LearningRate 0.0321 Epoch: 8 Global Step: 107570 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:01:26,013-Speed 3212.16 samples/sec Loss 4.9072 LearningRate 0.0321 Epoch: 8 Global Step: 107580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:01:29,229-Speed 3184.55 samples/sec Loss 4.8568 LearningRate 0.0321 Epoch: 8 Global Step: 107590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:01:32,327-Speed 3305.86 samples/sec Loss 4.8374 LearningRate 0.0321 Epoch: 8 Global Step: 107600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:01:35,384-Speed 3352.01 samples/sec Loss 4.8775 LearningRate 0.0321 Epoch: 8 Global Step: 107610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:01:38,498-Speed 3288.63 samples/sec Loss 4.7904 LearningRate 0.0321 Epoch: 8 Global Step: 107620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:01:41,593-Speed 3310.06 samples/sec Loss 4.8600 LearningRate 0.0321 Epoch: 8 Global Step: 107630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:01:44,713-Speed 3283.05 samples/sec Loss 4.8743 LearningRate 0.0321 Epoch: 8 Global Step: 107640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:01:47,841-Speed 3275.11 samples/sec Loss 4.9563 LearningRate 0.0321 Epoch: 8 Global Step: 107650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:01:50,977-Speed 3267.49 samples/sec Loss 4.8087 LearningRate 0.0321 Epoch: 8 Global Step: 107660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:01:54,107-Speed 3273.39 samples/sec Loss 4.8351 LearningRate 0.0321 Epoch: 8 Global Step: 107670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:01:57,180-Speed 3332.64 samples/sec Loss 4.8937 LearningRate 0.0321 Epoch: 8 Global Step: 107680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:02:00,284-Speed 3299.87 samples/sec Loss 4.8285 LearningRate 0.0321 Epoch: 8 Global Step: 107690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:02:03,421-Speed 3265.86 samples/sec Loss 4.7419 LearningRate 0.0321 Epoch: 8 Global Step: 107700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:02:06,555-Speed 3268.65 samples/sec Loss 4.8427 LearningRate 0.0321 Epoch: 8 Global Step: 107710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:02:09,645-Speed 3314.45 samples/sec Loss 4.7997 LearningRate 0.0321 Epoch: 8 Global Step: 107720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:02:12,733-Speed 3317.08 samples/sec Loss 4.8704 LearningRate 0.0321 Epoch: 8 Global Step: 107730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:02:15,793-Speed 3347.78 samples/sec Loss 4.8975 LearningRate 0.0321 Epoch: 8 Global Step: 107740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:02:18,877-Speed 3320.71 samples/sec Loss 4.8396 LearningRate 0.0321 Epoch: 8 Global Step: 107750 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:02:21,928-Speed 3357.45 samples/sec Loss 4.8082 LearningRate 0.0321 Epoch: 8 Global Step: 107760 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:02:25,129-Speed 3200.37 samples/sec Loss 4.9223 LearningRate 0.0321 Epoch: 8 Global Step: 107770 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:02:28,286-Speed 3245.02 samples/sec Loss 4.8393 LearningRate 0.0321 Epoch: 8 Global Step: 107780 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:02:31,389-Speed 3300.53 samples/sec Loss 4.9053 LearningRate 0.0320 Epoch: 8 Global Step: 107790 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:02:34,451-Speed 3345.69 samples/sec Loss 4.7827 LearningRate 0.0320 Epoch: 8 Global Step: 107800 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:02:37,584-Speed 3269.33 samples/sec Loss 4.8844 LearningRate 0.0320 Epoch: 8 Global Step: 107810 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:02:40,655-Speed 3335.24 samples/sec Loss 4.8967 LearningRate 0.0320 Epoch: 8 Global Step: 107820 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:02:43,822-Speed 3234.74 samples/sec Loss 4.8927 LearningRate 0.0320 Epoch: 8 Global Step: 107830 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:02:46,897-Speed 3330.53 samples/sec Loss 4.8184 LearningRate 0.0320 Epoch: 8 Global Step: 107840 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:02:50,059-Speed 3240.14 samples/sec Loss 4.8927 LearningRate 0.0320 Epoch: 8 Global Step: 107850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:02:53,224-Speed 3235.96 samples/sec Loss 4.9321 LearningRate 0.0320 Epoch: 8 Global Step: 107860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:02:56,285-Speed 3346.41 samples/sec Loss 4.8954 LearningRate 0.0320 Epoch: 8 Global Step: 107870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:02:59,363-Speed 3327.67 samples/sec Loss 4.8163 LearningRate 0.0320 Epoch: 8 Global Step: 107880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:02,501-Speed 3264.13 samples/sec Loss 4.8008 LearningRate 0.0320 Epoch: 8 Global Step: 107890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:05,613-Speed 3292.59 samples/sec Loss 4.7146 LearningRate 0.0320 Epoch: 8 Global Step: 107900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:08,711-Speed 3306.03 samples/sec Loss 4.8080 LearningRate 0.0320 Epoch: 8 Global Step: 107910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:11,844-Speed 3269.60 samples/sec Loss 4.8615 LearningRate 0.0320 Epoch: 8 Global Step: 107920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:14,952-Speed 3294.87 samples/sec Loss 4.8252 LearningRate 0.0320 Epoch: 8 Global Step: 107930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:18,045-Speed 3312.35 samples/sec Loss 4.8937 LearningRate 0.0320 Epoch: 8 Global Step: 107940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:21,129-Speed 3321.33 samples/sec Loss 4.8889 LearningRate 0.0320 Epoch: 8 Global Step: 107950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:03:24,189-Speed 3348.10 samples/sec Loss 4.8617 LearningRate 0.0320 Epoch: 8 Global Step: 107960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:03:27,305-Speed 3287.00 samples/sec Loss 4.8285 LearningRate 0.0320 Epoch: 8 Global Step: 107970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:30,413-Speed 3295.57 samples/sec Loss 4.7938 LearningRate 0.0320 Epoch: 8 Global Step: 107980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:33,508-Speed 3309.91 samples/sec Loss 4.8971 LearningRate 0.0320 Epoch: 8 Global Step: 107990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:36,650-Speed 3259.38 samples/sec Loss 4.8383 LearningRate 0.0320 Epoch: 8 Global Step: 108000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:39,717-Speed 3339.81 samples/sec Loss 4.9204 LearningRate 0.0319 Epoch: 8 Global Step: 108010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:42,818-Speed 3303.46 samples/sec Loss 4.9242 LearningRate 0.0319 Epoch: 8 Global Step: 108020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:45,886-Speed 3339.53 samples/sec Loss 4.7549 LearningRate 0.0319 Epoch: 8 Global Step: 108030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:48,966-Speed 3325.29 samples/sec Loss 4.8359 LearningRate 0.0319 Epoch: 8 Global Step: 108040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:52,135-Speed 3232.99 samples/sec Loss 4.8664 LearningRate 0.0319 Epoch: 8 Global Step: 108050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:55,256-Speed 3281.80 samples/sec Loss 4.8885 LearningRate 0.0319 Epoch: 8 Global Step: 108060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:03:58,318-Speed 3345.14 samples/sec Loss 4.8362 LearningRate 0.0319 Epoch: 8 Global Step: 108070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:04:01,456-Speed 3264.23 samples/sec Loss 4.7950 LearningRate 0.0319 Epoch: 8 Global Step: 108080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:04:04,564-Speed 3296.52 samples/sec Loss 4.8015 LearningRate 0.0319 Epoch: 8 Global Step: 108090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:04:07,712-Speed 3253.75 samples/sec Loss 4.8908 LearningRate 0.0319 Epoch: 8 Global Step: 108100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:04:10,822-Speed 3293.08 samples/sec Loss 4.8852 LearningRate 0.0319 Epoch: 8 Global Step: 108110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:04:13,881-Speed 3348.54 samples/sec Loss 4.9081 LearningRate 0.0319 Epoch: 8 Global Step: 108120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:04:17,003-Speed 3281.08 samples/sec Loss 4.8186 LearningRate 0.0319 Epoch: 8 Global Step: 108130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:04:20,153-Speed 3251.84 samples/sec Loss 4.8346 LearningRate 0.0319 Epoch: 8 Global Step: 108140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:04:23,225-Speed 3334.18 samples/sec Loss 4.8573 LearningRate 0.0319 Epoch: 8 Global Step: 108150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:04:26,360-Speed 3267.55 samples/sec Loss 4.8803 LearningRate 0.0319 Epoch: 8 Global Step: 108160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:04:29,430-Speed 3337.34 samples/sec Loss 4.8097 LearningRate 0.0319 Epoch: 8 Global Step: 108170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:04:32,501-Speed 3335.07 samples/sec Loss 4.8559 LearningRate 0.0319 Epoch: 8 Global Step: 108180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:04:35,650-Speed 3253.18 samples/sec Loss 4.7822 LearningRate 0.0319 Epoch: 8 Global Step: 108190 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:04:38,785-Speed 3266.82 samples/sec Loss 4.7164 LearningRate 0.0319 Epoch: 8 Global Step: 108200 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:04:41,866-Speed 3325.05 samples/sec Loss 4.8631 LearningRate 0.0319 Epoch: 8 Global Step: 108210 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:04:44,972-Speed 3297.52 samples/sec Loss 4.8293 LearningRate 0.0319 Epoch: 8 Global Step: 108220 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:04:48,148-Speed 3225.38 samples/sec Loss 4.9706 LearningRate 0.0318 Epoch: 8 Global Step: 108230 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:04:51,216-Speed 3338.73 samples/sec Loss 4.9352 LearningRate 0.0318 Epoch: 8 Global Step: 108240 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:04:54,301-Speed 3321.20 samples/sec Loss 4.7464 LearningRate 0.0318 Epoch: 8 Global Step: 108250 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:04:57,371-Speed 3336.47 samples/sec Loss 4.8077 LearningRate 0.0318 Epoch: 8 Global Step: 108260 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:05:00,463-Speed 3312.97 samples/sec Loss 4.7966 LearningRate 0.0318 Epoch: 8 Global Step: 108270 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:05:03,553-Speed 3314.77 samples/sec Loss 4.8364 LearningRate 0.0318 Epoch: 8 Global Step: 108280 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:05:06,677-Speed 3278.78 samples/sec Loss 4.9654 LearningRate 0.0318 Epoch: 8 Global Step: 108290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:09,779-Speed 3302.04 samples/sec Loss 4.8717 LearningRate 0.0318 Epoch: 8 Global Step: 108300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:12,953-Speed 3226.73 samples/sec Loss 4.8568 LearningRate 0.0318 Epoch: 8 Global Step: 108310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:16,092-Speed 3263.49 samples/sec Loss 4.8968 LearningRate 0.0318 Epoch: 8 Global Step: 108320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:19,206-Speed 3288.90 samples/sec Loss 4.8819 LearningRate 0.0318 Epoch: 8 Global Step: 108330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:22,302-Speed 3309.29 samples/sec Loss 4.8550 LearningRate 0.0318 Epoch: 8 Global Step: 108340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:25,458-Speed 3244.93 samples/sec Loss 4.7535 LearningRate 0.0318 Epoch: 8 Global Step: 108350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:28,585-Speed 3276.20 samples/sec Loss 4.8732 LearningRate 0.0318 Epoch: 8 Global Step: 108360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:31,701-Speed 3286.78 samples/sec Loss 4.8899 LearningRate 0.0318 Epoch: 8 Global Step: 108370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:34,818-Speed 3286.40 samples/sec Loss 4.8988 LearningRate 0.0318 Epoch: 8 Global Step: 108380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:37,908-Speed 3315.04 samples/sec Loss 4.8581 LearningRate 0.0318 Epoch: 8 Global Step: 108390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:05:40,950-Speed 3367.40 samples/sec Loss 4.9179 LearningRate 0.0318 Epoch: 8 Global Step: 108400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:44,000-Speed 3358.23 samples/sec Loss 4.8447 LearningRate 0.0318 Epoch: 8 Global Step: 108410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:47,057-Speed 3351.78 samples/sec Loss 4.8945 LearningRate 0.0318 Epoch: 8 Global Step: 108420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:50,210-Speed 3248.06 samples/sec Loss 4.8509 LearningRate 0.0318 Epoch: 8 Global Step: 108430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:53,341-Speed 3271.20 samples/sec Loss 4.9069 LearningRate 0.0318 Epoch: 8 Global Step: 108440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:05:57,032-Speed 2775.40 samples/sec Loss 4.7967 LearningRate 0.0317 Epoch: 8 Global Step: 108450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:06:00,155-Speed 3280.09 samples/sec Loss 4.8557 LearningRate 0.0317 Epoch: 8 Global Step: 108460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:06:03,271-Speed 3287.22 samples/sec Loss 4.8913 LearningRate 0.0317 Epoch: 8 Global Step: 108470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:06:06,385-Speed 3290.17 samples/sec Loss 4.8136 LearningRate 0.0317 Epoch: 8 Global Step: 108480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:06:09,418-Speed 3376.54 samples/sec Loss 4.8466 LearningRate 0.0317 Epoch: 8 Global Step: 108490 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:06:12,565-Speed 3255.51 samples/sec Loss 4.8118 LearningRate 0.0317 Epoch: 8 Global Step: 108500 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:06:15,653-Speed 3316.75 samples/sec Loss 4.8239 LearningRate 0.0317 Epoch: 8 Global Step: 108510 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:06:18,746-Speed 3312.28 samples/sec Loss 4.8367 LearningRate 0.0317 Epoch: 8 Global Step: 108520 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:06:21,816-Speed 3335.87 samples/sec Loss 4.9848 LearningRate 0.0317 Epoch: 8 Global Step: 108530 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:06:24,957-Speed 3260.72 samples/sec Loss 4.8344 LearningRate 0.0317 Epoch: 8 Global Step: 108540 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:06:28,068-Speed 3293.62 samples/sec Loss 4.8835 LearningRate 0.0317 Epoch: 8 Global Step: 108550 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:06:31,142-Speed 3331.90 samples/sec Loss 4.8451 LearningRate 0.0317 Epoch: 8 Global Step: 108560 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:06:34,249-Speed 3296.79 samples/sec Loss 4.8171 LearningRate 0.0317 Epoch: 8 Global Step: 108570 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:06:37,458-Speed 3192.47 samples/sec Loss 4.8881 LearningRate 0.0317 Epoch: 8 Global Step: 108580 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:06:40,604-Speed 3255.25 samples/sec Loss 4.8687 LearningRate 0.0317 Epoch: 8 Global Step: 108590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:06:43,703-Speed 3305.51 samples/sec Loss 4.9300 LearningRate 0.0317 Epoch: 8 Global Step: 108600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:06:46,801-Speed 3308.22 samples/sec Loss 4.8384 LearningRate 0.0317 Epoch: 8 Global Step: 108610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:06:49,892-Speed 3313.67 samples/sec Loss 4.8274 LearningRate 0.0317 Epoch: 8 Global Step: 108620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:06:53,023-Speed 3271.34 samples/sec Loss 4.6870 LearningRate 0.0317 Epoch: 8 Global Step: 108630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:06:56,172-Speed 3252.78 samples/sec Loss 4.9044 LearningRate 0.0317 Epoch: 8 Global Step: 108640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:06:59,268-Speed 3308.59 samples/sec Loss 4.8379 LearningRate 0.0317 Epoch: 8 Global Step: 108650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:07:02,358-Speed 3314.63 samples/sec Loss 4.7970 LearningRate 0.0317 Epoch: 8 Global Step: 108660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:07:05,436-Speed 3328.01 samples/sec Loss 4.7917 LearningRate 0.0316 Epoch: 8 Global Step: 108670 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:07:08,543-Speed 3296.94 samples/sec Loss 4.8224 LearningRate 0.0316 Epoch: 8 Global Step: 108680 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:07:11,698-Speed 3246.07 samples/sec Loss 4.8257 LearningRate 0.0316 Epoch: 8 Global Step: 108690 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:07:14,842-Speed 3257.99 samples/sec Loss 4.7919 LearningRate 0.0316 Epoch: 8 Global Step: 108700 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:07:17,971-Speed 3274.07 samples/sec Loss 4.8600 LearningRate 0.0316 Epoch: 8 Global Step: 108710 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:07:21,093-Speed 3281.55 samples/sec Loss 4.8405 LearningRate 0.0316 Epoch: 8 Global Step: 108720 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:07:24,196-Speed 3300.54 samples/sec Loss 4.7873 LearningRate 0.0316 Epoch: 8 Global Step: 108730 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:07:27,297-Speed 3302.61 samples/sec Loss 4.8026 LearningRate 0.0316 Epoch: 8 Global Step: 108740 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:07:30,406-Speed 3295.41 samples/sec Loss 4.8515 LearningRate 0.0316 Epoch: 8 Global Step: 108750 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:07:34,811-Speed 2325.03 samples/sec Loss 4.9120 LearningRate 0.0316 Epoch: 8 Global Step: 108760 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:07:39,673-Speed 2106.79 samples/sec Loss 4.8782 LearningRate 0.0316 Epoch: 8 Global Step: 108770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:07:42,786-Speed 3290.49 samples/sec Loss 4.9277 LearningRate 0.0316 Epoch: 8 Global Step: 108780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:07:45,857-Speed 3335.27 samples/sec Loss 4.8952 LearningRate 0.0316 Epoch: 8 Global Step: 108790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:07:48,978-Speed 3282.32 samples/sec Loss 4.8400 LearningRate 0.0316 Epoch: 8 Global Step: 108800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:07:52,116-Speed 3264.14 samples/sec Loss 4.8153 LearningRate 0.0316 Epoch: 8 Global Step: 108810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:07:55,226-Speed 3292.58 samples/sec Loss 4.8793 LearningRate 0.0316 Epoch: 8 Global Step: 108820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:07:58,319-Speed 3313.00 samples/sec Loss 4.8119 LearningRate 0.0316 Epoch: 8 Global Step: 108830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:01,508-Speed 3211.17 samples/sec Loss 4.7919 LearningRate 0.0316 Epoch: 8 Global Step: 108840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:04,658-Speed 3252.05 samples/sec Loss 4.8467 LearningRate 0.0316 Epoch: 8 Global Step: 108850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:07,835-Speed 3224.14 samples/sec Loss 4.8847 LearningRate 0.0316 Epoch: 8 Global Step: 108860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:10,906-Speed 3335.40 samples/sec Loss 4.8588 LearningRate 0.0316 Epoch: 8 Global Step: 108870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:14,056-Speed 3251.98 samples/sec Loss 4.8263 LearningRate 0.0316 Epoch: 8 Global Step: 108880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:17,164-Speed 3296.16 samples/sec Loss 4.8821 LearningRate 0.0315 Epoch: 8 Global Step: 108890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:20,230-Speed 3341.20 samples/sec Loss 4.7699 LearningRate 0.0315 Epoch: 8 Global Step: 108900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:23,325-Speed 3308.90 samples/sec Loss 4.9519 LearningRate 0.0315 Epoch: 8 Global Step: 108910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:26,479-Speed 3247.20 samples/sec Loss 4.7888 LearningRate 0.0315 Epoch: 8 Global Step: 108920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:29,562-Speed 3323.04 samples/sec Loss 4.9239 LearningRate 0.0315 Epoch: 8 Global Step: 108930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:32,643-Speed 3324.88 samples/sec Loss 4.9050 LearningRate 0.0315 Epoch: 8 Global Step: 108940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:35,799-Speed 3245.92 samples/sec Loss 4.9304 LearningRate 0.0315 Epoch: 8 Global Step: 108950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:38,910-Speed 3292.10 samples/sec Loss 4.8497 LearningRate 0.0315 Epoch: 8 Global Step: 108960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:41,997-Speed 3318.43 samples/sec Loss 4.8669 LearningRate 0.0315 Epoch: 8 Global Step: 108970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:45,088-Speed 3313.90 samples/sec Loss 4.7133 LearningRate 0.0315 Epoch: 8 Global Step: 108980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:48,181-Speed 3311.41 samples/sec Loss 4.8797 LearningRate 0.0315 Epoch: 8 Global Step: 108990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:51,273-Speed 3312.91 samples/sec Loss 4.8218 LearningRate 0.0315 Epoch: 8 Global Step: 109000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:54,364-Speed 3313.74 samples/sec Loss 4.9640 LearningRate 0.0315 Epoch: 8 Global Step: 109010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:08:57,418-Speed 3354.31 samples/sec Loss 4.8582 LearningRate 0.0315 Epoch: 8 Global Step: 109020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:00,471-Speed 3355.02 samples/sec Loss 4.9551 LearningRate 0.0315 Epoch: 8 Global Step: 109030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:03,574-Speed 3301.02 samples/sec Loss 4.9109 LearningRate 0.0315 Epoch: 8 Global Step: 109040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:06,675-Speed 3303.31 samples/sec Loss 4.9102 LearningRate 0.0315 Epoch: 8 Global Step: 109050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:09,793-Speed 3285.33 samples/sec Loss 4.7866 LearningRate 0.0315 Epoch: 8 Global Step: 109060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:12,924-Speed 3272.29 samples/sec Loss 4.8805 LearningRate 0.0315 Epoch: 8 Global Step: 109070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:09:16,041-Speed 3286.06 samples/sec Loss 4.8465 LearningRate 0.0315 Epoch: 8 Global Step: 109080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:19,149-Speed 3294.99 samples/sec Loss 4.9119 LearningRate 0.0315 Epoch: 8 Global Step: 109090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:22,246-Speed 3308.01 samples/sec Loss 4.8312 LearningRate 0.0315 Epoch: 8 Global Step: 109100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:25,428-Speed 3218.76 samples/sec Loss 4.8415 LearningRate 0.0314 Epoch: 8 Global Step: 109110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:28,515-Speed 3318.51 samples/sec Loss 4.8716 LearningRate 0.0314 Epoch: 8 Global Step: 109120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:31,641-Speed 3276.63 samples/sec Loss 4.8902 LearningRate 0.0314 Epoch: 8 Global Step: 109130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:34,714-Speed 3333.77 samples/sec Loss 4.7684 LearningRate 0.0314 Epoch: 8 Global Step: 109140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:37,796-Speed 3325.22 samples/sec Loss 4.8697 LearningRate 0.0314 Epoch: 8 Global Step: 109150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:40,874-Speed 3327.60 samples/sec Loss 4.9367 LearningRate 0.0314 Epoch: 8 Global Step: 109160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:43,952-Speed 3327.58 samples/sec Loss 5.0028 LearningRate 0.0314 Epoch: 8 Global Step: 109170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:47,048-Speed 3308.44 samples/sec Loss 4.8870 LearningRate 0.0314 Epoch: 8 Global Step: 109180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:50,161-Speed 3290.47 samples/sec Loss 4.7661 LearningRate 0.0314 Epoch: 8 Global Step: 109190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:53,290-Speed 3274.04 samples/sec Loss 4.8870 LearningRate 0.0314 Epoch: 8 Global Step: 109200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:56,356-Speed 3341.16 samples/sec Loss 4.8366 LearningRate 0.0314 Epoch: 8 Global Step: 109210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:09:59,437-Speed 3324.45 samples/sec Loss 4.8214 LearningRate 0.0314 Epoch: 8 Global Step: 109220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:02,500-Speed 3344.05 samples/sec Loss 4.7812 LearningRate 0.0314 Epoch: 8 Global Step: 109230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:05,617-Speed 3286.45 samples/sec Loss 4.9447 LearningRate 0.0314 Epoch: 8 Global Step: 109240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:08,688-Speed 3334.75 samples/sec Loss 4.7968 LearningRate 0.0314 Epoch: 8 Global Step: 109250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:11,793-Speed 3299.55 samples/sec Loss 4.8940 LearningRate 0.0314 Epoch: 8 Global Step: 109260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:14,908-Speed 3288.55 samples/sec Loss 4.9357 LearningRate 0.0314 Epoch: 8 Global Step: 109270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:18,032-Speed 3279.42 samples/sec Loss 4.9612 LearningRate 0.0314 Epoch: 8 Global Step: 109280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:10:21,103-Speed 3334.81 samples/sec Loss 4.9177 LearningRate 0.0314 Epoch: 8 Global Step: 109290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:10:24,179-Speed 3330.79 samples/sec Loss 4.8322 LearningRate 0.0314 Epoch: 8 Global Step: 109300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:27,282-Speed 3300.97 samples/sec Loss 4.8393 LearningRate 0.0314 Epoch: 8 Global Step: 109310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:30,393-Speed 3291.73 samples/sec Loss 4.8203 LearningRate 0.0314 Epoch: 8 Global Step: 109320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:33,453-Speed 3347.24 samples/sec Loss 4.8197 LearningRate 0.0313 Epoch: 8 Global Step: 109330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:36,633-Speed 3221.37 samples/sec Loss 4.7626 LearningRate 0.0313 Epoch: 8 Global Step: 109340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:39,755-Speed 3280.85 samples/sec Loss 4.9078 LearningRate 0.0313 Epoch: 8 Global Step: 109350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:42,913-Speed 3244.40 samples/sec Loss 4.9136 LearningRate 0.0313 Epoch: 8 Global Step: 109360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:45,977-Speed 3342.69 samples/sec Loss 4.8885 LearningRate 0.0313 Epoch: 8 Global Step: 109370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:49,076-Speed 3305.72 samples/sec Loss 4.7976 LearningRate 0.0313 Epoch: 8 Global Step: 109380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:52,141-Speed 3341.03 samples/sec Loss 4.9283 LearningRate 0.0313 Epoch: 8 Global Step: 109390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:10:55,182-Speed 3368.82 samples/sec Loss 4.9291 LearningRate 0.0313 Epoch: 8 Global Step: 109400 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:10:58,260-Speed 3327.78 samples/sec Loss 4.9097 LearningRate 0.0313 Epoch: 8 Global Step: 109410 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:11:01,346-Speed 3319.66 samples/sec Loss 4.8746 LearningRate 0.0313 Epoch: 8 Global Step: 109420 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:11:04,464-Speed 3284.34 samples/sec Loss 4.8869 LearningRate 0.0313 Epoch: 8 Global Step: 109430 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:11:07,576-Speed 3292.45 samples/sec Loss 4.8071 LearningRate 0.0313 Epoch: 8 Global Step: 109440 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:11:10,672-Speed 3308.04 samples/sec Loss 4.9304 LearningRate 0.0313 Epoch: 8 Global Step: 109450 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:11:13,835-Speed 3239.02 samples/sec Loss 4.9796 LearningRate 0.0313 Epoch: 8 Global Step: 109460 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:11:17,029-Speed 3206.51 samples/sec Loss 4.9136 LearningRate 0.0313 Epoch: 8 Global Step: 109470 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:11:20,144-Speed 3288.34 samples/sec Loss 4.8204 LearningRate 0.0313 Epoch: 8 Global Step: 109480 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:11:23,229-Speed 3320.21 samples/sec Loss 4.8548 LearningRate 0.0313 Epoch: 8 Global Step: 109490 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:11:26,336-Speed 3297.15 samples/sec Loss 4.8944 LearningRate 0.0313 Epoch: 8 Global Step: 109500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:11:29,409-Speed 3333.26 samples/sec Loss 4.8143 LearningRate 0.0313 Epoch: 8 Global Step: 109510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:11:32,508-Speed 3305.69 samples/sec Loss 4.9208 LearningRate 0.0313 Epoch: 8 Global Step: 109520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:11:35,675-Speed 3233.94 samples/sec Loss 4.8427 LearningRate 0.0313 Epoch: 8 Global Step: 109530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:11:38,842-Speed 3234.66 samples/sec Loss 4.7879 LearningRate 0.0313 Epoch: 8 Global Step: 109540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:11:41,974-Speed 3270.45 samples/sec Loss 4.8389 LearningRate 0.0312 Epoch: 8 Global Step: 109550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:11:45,053-Speed 3326.46 samples/sec Loss 4.9299 LearningRate 0.0312 Epoch: 8 Global Step: 109560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:11:48,208-Speed 3246.78 samples/sec Loss 4.8063 LearningRate 0.0312 Epoch: 8 Global Step: 109570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:11:51,364-Speed 3245.99 samples/sec Loss 4.9092 LearningRate 0.0312 Epoch: 8 Global Step: 109580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:11:54,503-Speed 3263.21 samples/sec Loss 4.8123 LearningRate 0.0312 Epoch: 8 Global Step: 109590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:11:57,640-Speed 3265.51 samples/sec Loss 4.8874 LearningRate 0.0312 Epoch: 8 Global Step: 109600 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:12:00,829-Speed 3211.80 samples/sec Loss 4.8779 LearningRate 0.0312 Epoch: 8 Global Step: 109610 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:12:03,922-Speed 3311.24 samples/sec Loss 4.8397 LearningRate 0.0312 Epoch: 8 Global Step: 109620 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:12:06,996-Speed 3333.01 samples/sec Loss 4.8437 LearningRate 0.0312 Epoch: 8 Global Step: 109630 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:12:10,074-Speed 3327.34 samples/sec Loss 4.9163 LearningRate 0.0312 Epoch: 8 Global Step: 109640 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:12:13,202-Speed 3274.34 samples/sec Loss 4.8494 LearningRate 0.0312 Epoch: 8 Global Step: 109650 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:12:16,273-Speed 3335.87 samples/sec Loss 4.8122 LearningRate 0.0312 Epoch: 8 Global Step: 109660 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:12:19,385-Speed 3292.11 samples/sec Loss 4.9407 LearningRate 0.0312 Epoch: 8 Global Step: 109670 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:12:22,476-Speed 3313.21 samples/sec Loss 4.9312 LearningRate 0.0312 Epoch: 8 Global Step: 109680 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:12:25,613-Speed 3265.17 samples/sec Loss 4.9931 LearningRate 0.0312 Epoch: 8 Global Step: 109690 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:12:28,803-Speed 3211.95 samples/sec Loss 4.8598 LearningRate 0.0312 Epoch: 8 Global Step: 109700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:12:31,903-Speed 3304.06 samples/sec Loss 4.9365 LearningRate 0.0312 Epoch: 8 Global Step: 109710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:12:35,012-Speed 3294.13 samples/sec Loss 4.9373 LearningRate 0.0312 Epoch: 8 Global Step: 109720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:12:38,184-Speed 3229.42 samples/sec Loss 4.8793 LearningRate 0.0312 Epoch: 8 Global Step: 109730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:12:41,399-Speed 3186.15 samples/sec Loss 4.9069 LearningRate 0.0312 Epoch: 8 Global Step: 109740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:12:44,551-Speed 3249.95 samples/sec Loss 4.6950 LearningRate 0.0312 Epoch: 8 Global Step: 109750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:12:47,685-Speed 3267.76 samples/sec Loss 4.9282 LearningRate 0.0312 Epoch: 8 Global Step: 109760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:12:50,829-Speed 3258.60 samples/sec Loss 4.8251 LearningRate 0.0312 Epoch: 8 Global Step: 109770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:12:53,933-Speed 3300.52 samples/sec Loss 4.7805 LearningRate 0.0311 Epoch: 8 Global Step: 109780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:12:57,026-Speed 3311.12 samples/sec Loss 4.8537 LearningRate 0.0311 Epoch: 8 Global Step: 109790 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:13:00,153-Speed 3276.20 samples/sec Loss 4.8828 LearningRate 0.0311 Epoch: 8 Global Step: 109800 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:13:03,343-Speed 3211.44 samples/sec Loss 4.8896 LearningRate 0.0311 Epoch: 8 Global Step: 109810 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:13:06,533-Speed 3210.98 samples/sec Loss 4.9266 LearningRate 0.0311 Epoch: 8 Global Step: 109820 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:13:09,629-Speed 3308.98 samples/sec Loss 4.8841 LearningRate 0.0311 Epoch: 8 Global Step: 109830 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:13:12,822-Speed 3207.14 samples/sec Loss 4.8338 LearningRate 0.0311 Epoch: 8 Global Step: 109840 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:13:16,030-Speed 3193.32 samples/sec Loss 4.8732 LearningRate 0.0311 Epoch: 8 Global Step: 109850 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:13:19,165-Speed 3267.55 samples/sec Loss 4.9820 LearningRate 0.0311 Epoch: 8 Global Step: 109860 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:13:22,239-Speed 3332.09 samples/sec Loss 4.8228 LearningRate 0.0311 Epoch: 8 Global Step: 109870 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:13:25,357-Speed 3285.17 samples/sec Loss 4.8476 LearningRate 0.0311 Epoch: 8 Global Step: 109880 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:13:28,450-Speed 3312.35 samples/sec Loss 4.8011 LearningRate 0.0311 Epoch: 8 Global Step: 109890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:13:31,589-Speed 3262.18 samples/sec Loss 4.9359 LearningRate 0.0311 Epoch: 8 Global Step: 109900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:13:34,707-Speed 3285.32 samples/sec Loss 4.9028 LearningRate 0.0311 Epoch: 8 Global Step: 109910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:13:37,841-Speed 3268.82 samples/sec Loss 4.9013 LearningRate 0.0311 Epoch: 8 Global Step: 109920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:13:41,039-Speed 3202.77 samples/sec Loss 4.8976 LearningRate 0.0311 Epoch: 8 Global Step: 109930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:13:44,171-Speed 3271.68 samples/sec Loss 4.8970 LearningRate 0.0311 Epoch: 8 Global Step: 109940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:13:47,317-Speed 3255.18 samples/sec Loss 4.8067 LearningRate 0.0311 Epoch: 8 Global Step: 109950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:13:50,479-Speed 3239.27 samples/sec Loss 4.8594 LearningRate 0.0311 Epoch: 8 Global Step: 109960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:13:53,600-Speed 3281.94 samples/sec Loss 4.8191 LearningRate 0.0311 Epoch: 8 Global Step: 109970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:13:56,669-Speed 3337.60 samples/sec Loss 4.9167 LearningRate 0.0311 Epoch: 8 Global Step: 109980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:13:59,740-Speed 3335.90 samples/sec Loss 4.9054 LearningRate 0.0311 Epoch: 8 Global Step: 109990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:14:02,865-Speed 3277.95 samples/sec Loss 4.8000 LearningRate 0.0310 Epoch: 8 Global Step: 110000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:14:05,926-Speed 3345.84 samples/sec Loss 4.8714 LearningRate 0.0310 Epoch: 8 Global Step: 110010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:14:08,971-Speed 3364.05 samples/sec Loss 4.9177 LearningRate 0.0310 Epoch: 8 Global Step: 110020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:14:12,091-Speed 3283.31 samples/sec Loss 4.9894 LearningRate 0.0310 Epoch: 8 Global Step: 110030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:14:15,199-Speed 3295.98 samples/sec Loss 4.7432 LearningRate 0.0310 Epoch: 8 Global Step: 110040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:14:18,343-Speed 3257.79 samples/sec Loss 4.9296 LearningRate 0.0310 Epoch: 8 Global Step: 110050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:14:21,464-Speed 3281.63 samples/sec Loss 4.9153 LearningRate 0.0310 Epoch: 8 Global Step: 110060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:14:24,612-Speed 3254.95 samples/sec Loss 4.8876 LearningRate 0.0310 Epoch: 8 Global Step: 110070 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:14:27,693-Speed 3324.56 samples/sec Loss 4.7754 LearningRate 0.0310 Epoch: 8 Global Step: 110080 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:14:30,818-Speed 3277.98 samples/sec Loss 4.8851 LearningRate 0.0310 Epoch: 8 Global Step: 110090 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:14:33,952-Speed 3268.32 samples/sec Loss 4.8181 LearningRate 0.0310 Epoch: 8 Global Step: 110100 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:14:37,078-Speed 3277.35 samples/sec Loss 4.9152 LearningRate 0.0310 Epoch: 8 Global Step: 110110 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:14:40,171-Speed 3310.77 samples/sec Loss 4.8840 LearningRate 0.0310 Epoch: 8 Global Step: 110120 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:14:43,292-Speed 3283.12 samples/sec Loss 4.8480 LearningRate 0.0310 Epoch: 8 Global Step: 110130 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:14:46,338-Speed 3362.37 samples/sec Loss 4.8332 LearningRate 0.0310 Epoch: 8 Global Step: 110140 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:14:49,403-Speed 3341.52 samples/sec Loss 4.8346 LearningRate 0.0310 Epoch: 8 Global Step: 110150 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:14:52,547-Speed 3258.49 samples/sec Loss 4.8345 LearningRate 0.0310 Epoch: 8 Global Step: 110160 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:14:55,708-Speed 3241.19 samples/sec Loss 4.8050 LearningRate 0.0310 Epoch: 8 Global Step: 110170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:14:58,757-Speed 3359.36 samples/sec Loss 4.8962 LearningRate 0.0310 Epoch: 8 Global Step: 110180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:15:01,916-Speed 3242.28 samples/sec Loss 4.9190 LearningRate 0.0310 Epoch: 8 Global Step: 110190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:15:05,077-Speed 3240.60 samples/sec Loss 4.7583 LearningRate 0.0310 Epoch: 8 Global Step: 110200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:15:08,194-Speed 3286.33 samples/sec Loss 4.8167 LearningRate 0.0310 Epoch: 8 Global Step: 110210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:15:11,257-Speed 3344.66 samples/sec Loss 4.8635 LearningRate 0.0309 Epoch: 8 Global Step: 110220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:15:14,441-Speed 3216.77 samples/sec Loss 4.7840 LearningRate 0.0309 Epoch: 8 Global Step: 110230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:15:17,577-Speed 3265.91 samples/sec Loss 4.7223 LearningRate 0.0309 Epoch: 8 Global Step: 110240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:15:20,626-Speed 3359.60 samples/sec Loss 4.9596 LearningRate 0.0309 Epoch: 8 Global Step: 110250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:15:23,734-Speed 3295.98 samples/sec Loss 4.8214 LearningRate 0.0309 Epoch: 8 Global Step: 110260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:15:26,920-Speed 3214.85 samples/sec Loss 4.8129 LearningRate 0.0309 Epoch: 8 Global Step: 110270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:15:29,982-Speed 3345.42 samples/sec Loss 4.8319 LearningRate 0.0309 Epoch: 8 Global Step: 110280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:15:33,071-Speed 3315.59 samples/sec Loss 4.8311 LearningRate 0.0309 Epoch: 8 Global Step: 110290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:15:36,175-Speed 3300.41 samples/sec Loss 4.9273 LearningRate 0.0309 Epoch: 8 Global Step: 110300 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:15:39,259-Speed 3321.03 samples/sec Loss 4.9200 LearningRate 0.0309 Epoch: 8 Global Step: 110310 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:15:42,331-Speed 3334.18 samples/sec Loss 4.9367 LearningRate 0.0309 Epoch: 8 Global Step: 110320 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:15:45,423-Speed 3313.35 samples/sec Loss 4.8206 LearningRate 0.0309 Epoch: 8 Global Step: 110330 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:15:48,489-Speed 3340.90 samples/sec Loss 4.9025 LearningRate 0.0309 Epoch: 8 Global Step: 110340 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:15:51,562-Speed 3332.97 samples/sec Loss 4.9965 LearningRate 0.0309 Epoch: 8 Global Step: 110350 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:15:54,645-Speed 3323.69 samples/sec Loss 4.8321 LearningRate 0.0309 Epoch: 8 Global Step: 110360 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:15:57,697-Speed 3355.46 samples/sec Loss 4.8645 LearningRate 0.0309 Epoch: 8 Global Step: 110370 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:16:00,816-Speed 3284.66 samples/sec Loss 4.8632 LearningRate 0.0309 Epoch: 8 Global Step: 110380 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:16:03,878-Speed 3344.66 samples/sec Loss 4.9392 LearningRate 0.0309 Epoch: 8 Global Step: 110390 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:16:06,934-Speed 3351.92 samples/sec Loss 4.8665 LearningRate 0.0309 Epoch: 8 Global Step: 110400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:10,008-Speed 3332.46 samples/sec Loss 4.8643 LearningRate 0.0309 Epoch: 8 Global Step: 110410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:13,130-Speed 3280.90 samples/sec Loss 4.9398 LearningRate 0.0309 Epoch: 8 Global Step: 110420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:16,243-Speed 3290.06 samples/sec Loss 4.8466 LearningRate 0.0309 Epoch: 8 Global Step: 110430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:19,367-Speed 3279.39 samples/sec Loss 4.8568 LearningRate 0.0309 Epoch: 8 Global Step: 110440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:22,443-Speed 3330.06 samples/sec Loss 4.9552 LearningRate 0.0308 Epoch: 8 Global Step: 110450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:25,528-Speed 3320.69 samples/sec Loss 4.8332 LearningRate 0.0308 Epoch: 8 Global Step: 110460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:28,632-Speed 3300.62 samples/sec Loss 4.9016 LearningRate 0.0308 Epoch: 8 Global Step: 110470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:31,698-Speed 3340.32 samples/sec Loss 4.8442 LearningRate 0.0308 Epoch: 8 Global Step: 110480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:34,828-Speed 3272.31 samples/sec Loss 4.9040 LearningRate 0.0308 Epoch: 8 Global Step: 110490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:37,900-Speed 3334.37 samples/sec Loss 4.8602 LearningRate 0.0308 Epoch: 8 Global Step: 110500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:16:41,010-Speed 3294.13 samples/sec Loss 4.9266 LearningRate 0.0308 Epoch: 8 Global Step: 110510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:16:44,087-Speed 3328.49 samples/sec Loss 4.9094 LearningRate 0.0308 Epoch: 8 Global Step: 110520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:16:47,135-Speed 3361.02 samples/sec Loss 4.8984 LearningRate 0.0308 Epoch: 8 Global Step: 110530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:50,202-Speed 3340.34 samples/sec Loss 4.8206 LearningRate 0.0308 Epoch: 8 Global Step: 110540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:53,310-Speed 3295.16 samples/sec Loss 4.7813 LearningRate 0.0308 Epoch: 8 Global Step: 110550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:56,406-Speed 3308.99 samples/sec Loss 4.8329 LearningRate 0.0308 Epoch: 8 Global Step: 110560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:16:59,459-Speed 3354.67 samples/sec Loss 4.8085 LearningRate 0.0308 Epoch: 8 Global Step: 110570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:17:02,542-Speed 3323.02 samples/sec Loss 4.8496 LearningRate 0.0308 Epoch: 8 Global Step: 110580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:17:05,661-Speed 3283.78 samples/sec Loss 4.9385 LearningRate 0.0308 Epoch: 8 Global Step: 110590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:17:08,727-Speed 3340.86 samples/sec Loss 4.8589 LearningRate 0.0308 Epoch: 8 Global Step: 110600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:17:11,787-Speed 3347.60 samples/sec Loss 4.8908 LearningRate 0.0308 Epoch: 8 Global Step: 110610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:17:14,894-Speed 3297.42 samples/sec Loss 4.8827 LearningRate 0.0308 Epoch: 8 Global Step: 110620 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:17:17,950-Speed 3351.27 samples/sec Loss 4.8161 LearningRate 0.0308 Epoch: 8 Global Step: 110630 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:17:21,009-Speed 3348.38 samples/sec Loss 4.8450 LearningRate 0.0308 Epoch: 8 Global Step: 110640 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:17:24,070-Speed 3347.04 samples/sec Loss 4.8813 LearningRate 0.0308 Epoch: 8 Global Step: 110650 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:17:27,257-Speed 3214.27 samples/sec Loss 4.8568 LearningRate 0.0308 Epoch: 8 Global Step: 110660 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:17:30,344-Speed 3317.87 samples/sec Loss 4.8678 LearningRate 0.0307 Epoch: 8 Global Step: 110670 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:17:33,402-Speed 3349.83 samples/sec Loss 4.8195 LearningRate 0.0307 Epoch: 8 Global Step: 110680 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:17:36,535-Speed 3269.67 samples/sec Loss 4.8288 LearningRate 0.0307 Epoch: 8 Global Step: 110690 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:17:39,651-Speed 3286.33 samples/sec Loss 4.9065 LearningRate 0.0307 Epoch: 8 Global Step: 110700 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:17:42,742-Speed 3314.58 samples/sec Loss 4.8973 LearningRate 0.0307 Epoch: 8 Global Step: 110710 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:17:45,818-Speed 3329.94 samples/sec Loss 4.8529 LearningRate 0.0307 Epoch: 8 Global Step: 110720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:17:48,960-Speed 3259.53 samples/sec Loss 4.7904 LearningRate 0.0307 Epoch: 8 Global Step: 110730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:17:52,124-Speed 3237.77 samples/sec Loss 4.9121 LearningRate 0.0307 Epoch: 8 Global Step: 110740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:17:55,275-Speed 3250.23 samples/sec Loss 4.8186 LearningRate 0.0307 Epoch: 8 Global Step: 110750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:17:58,332-Speed 3350.74 samples/sec Loss 4.9068 LearningRate 0.0307 Epoch: 8 Global Step: 110760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:01,505-Speed 3228.48 samples/sec Loss 4.8012 LearningRate 0.0307 Epoch: 8 Global Step: 110770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:04,600-Speed 3309.40 samples/sec Loss 4.9246 LearningRate 0.0307 Epoch: 8 Global Step: 110780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:07,773-Speed 3228.23 samples/sec Loss 4.9864 LearningRate 0.0307 Epoch: 8 Global Step: 110790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:10,894-Speed 3282.89 samples/sec Loss 4.8039 LearningRate 0.0307 Epoch: 8 Global Step: 110800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:14,027-Speed 3268.61 samples/sec Loss 4.7266 LearningRate 0.0307 Epoch: 8 Global Step: 110810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:17,181-Speed 3248.02 samples/sec Loss 4.8282 LearningRate 0.0307 Epoch: 8 Global Step: 110820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:18:20,232-Speed 3357.24 samples/sec Loss 4.8675 LearningRate 0.0307 Epoch: 8 Global Step: 110830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:23,321-Speed 3316.78 samples/sec Loss 4.8520 LearningRate 0.0307 Epoch: 8 Global Step: 110840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:26,405-Speed 3321.37 samples/sec Loss 4.8072 LearningRate 0.0307 Epoch: 8 Global Step: 110850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:29,530-Speed 3277.59 samples/sec Loss 4.8539 LearningRate 0.0307 Epoch: 8 Global Step: 110860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:32,615-Speed 3320.36 samples/sec Loss 4.9055 LearningRate 0.0307 Epoch: 8 Global Step: 110870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:35,770-Speed 3246.09 samples/sec Loss 4.8244 LearningRate 0.0307 Epoch: 8 Global Step: 110880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:38,878-Speed 3295.35 samples/sec Loss 4.9179 LearningRate 0.0306 Epoch: 8 Global Step: 110890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:18:41,924-Speed 3363.96 samples/sec Loss 4.8404 LearningRate 0.0306 Epoch: 8 Global Step: 110900 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:18:45,012-Speed 3317.44 samples/sec Loss 4.8836 LearningRate 0.0306 Epoch: 8 Global Step: 110910 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:18:48,168-Speed 3245.12 samples/sec Loss 4.8822 LearningRate 0.0306 Epoch: 8 Global Step: 110920 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:18:51,240-Speed 3335.29 samples/sec Loss 4.8425 LearningRate 0.0306 Epoch: 8 Global Step: 110930 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:18:54,350-Speed 3293.30 samples/sec Loss 4.9117 LearningRate 0.0306 Epoch: 8 Global Step: 110940 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:18:57,409-Speed 3349.33 samples/sec Loss 4.9605 LearningRate 0.0306 Epoch: 8 Global Step: 110950 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:19:00,545-Speed 3265.57 samples/sec Loss 4.9548 LearningRate 0.0306 Epoch: 8 Global Step: 110960 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:19:03,646-Speed 3303.14 samples/sec Loss 4.9286 LearningRate 0.0306 Epoch: 8 Global Step: 110970 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:19:06,716-Speed 3337.26 samples/sec Loss 4.9421 LearningRate 0.0306 Epoch: 8 Global Step: 110980 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:19:09,790-Speed 3331.57 samples/sec Loss 4.8316 LearningRate 0.0306 Epoch: 8 Global Step: 110990 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:19:12,934-Speed 3259.03 samples/sec Loss 4.8707 LearningRate 0.0306 Epoch: 8 Global Step: 111000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:19:16,115-Speed 3220.03 samples/sec Loss 4.8678 LearningRate 0.0306 Epoch: 8 Global Step: 111010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:19:19,235-Speed 3283.16 samples/sec Loss 4.8176 LearningRate 0.0306 Epoch: 8 Global Step: 111020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:19:22,297-Speed 3345.12 samples/sec Loss 4.8533 LearningRate 0.0306 Epoch: 8 Global Step: 111030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:19:25,379-Speed 3324.31 samples/sec Loss 4.8167 LearningRate 0.0306 Epoch: 8 Global Step: 111040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:19:28,496-Speed 3285.91 samples/sec Loss 4.7346 LearningRate 0.0306 Epoch: 8 Global Step: 111050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:19:31,591-Speed 3309.88 samples/sec Loss 4.9524 LearningRate 0.0306 Epoch: 8 Global Step: 111060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:19:34,693-Speed 3302.68 samples/sec Loss 4.9317 LearningRate 0.0306 Epoch: 8 Global Step: 111070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:19:37,832-Speed 3263.20 samples/sec Loss 4.8770 LearningRate 0.0306 Epoch: 8 Global Step: 111080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:19:40,890-Speed 3349.82 samples/sec Loss 4.8515 LearningRate 0.0306 Epoch: 8 Global Step: 111090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:19:44,000-Speed 3293.39 samples/sec Loss 4.8076 LearningRate 0.0306 Epoch: 8 Global Step: 111100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:19:47,114-Speed 3290.05 samples/sec Loss 4.9090 LearningRate 0.0306 Epoch: 8 Global Step: 111110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:19:50,215-Speed 3302.62 samples/sec Loss 4.7631 LearningRate 0.0305 Epoch: 8 Global Step: 111120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:19:53,278-Speed 3343.89 samples/sec Loss 4.8488 LearningRate 0.0305 Epoch: 8 Global Step: 111130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:19:56,330-Speed 3356.27 samples/sec Loss 4.8829 LearningRate 0.0305 Epoch: 8 Global Step: 111140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:19:59,402-Speed 3334.82 samples/sec Loss 4.9359 LearningRate 0.0305 Epoch: 8 Global Step: 111150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:02,511-Speed 3295.32 samples/sec Loss 4.9313 LearningRate 0.0305 Epoch: 8 Global Step: 111160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:05,630-Speed 3283.51 samples/sec Loss 4.8267 LearningRate 0.0305 Epoch: 8 Global Step: 111170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:08,719-Speed 3316.08 samples/sec Loss 4.8911 LearningRate 0.0305 Epoch: 8 Global Step: 111180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:11,844-Speed 3278.73 samples/sec Loss 4.9124 LearningRate 0.0305 Epoch: 8 Global Step: 111190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:14,942-Speed 3306.24 samples/sec Loss 4.7897 LearningRate 0.0305 Epoch: 8 Global Step: 111200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:18,042-Speed 3303.98 samples/sec Loss 4.8933 LearningRate 0.0305 Epoch: 8 Global Step: 111210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:21,130-Speed 3316.95 samples/sec Loss 4.8675 LearningRate 0.0305 Epoch: 8 Global Step: 111220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:24,207-Speed 3328.90 samples/sec Loss 4.9389 LearningRate 0.0305 Epoch: 8 Global Step: 111230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:27,312-Speed 3299.57 samples/sec Loss 4.8758 LearningRate 0.0305 Epoch: 8 Global Step: 111240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:20:30,413-Speed 3303.26 samples/sec Loss 4.8372 LearningRate 0.0305 Epoch: 8 Global Step: 111250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:20:33,482-Speed 3337.85 samples/sec Loss 4.8396 LearningRate 0.0305 Epoch: 8 Global Step: 111260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:20:36,586-Speed 3300.37 samples/sec Loss 4.8864 LearningRate 0.0305 Epoch: 8 Global Step: 111270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:20:39,693-Speed 3296.55 samples/sec Loss 4.7759 LearningRate 0.0305 Epoch: 8 Global Step: 111280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:42,806-Speed 3290.61 samples/sec Loss 4.9399 LearningRate 0.0305 Epoch: 8 Global Step: 111290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:45,881-Speed 3331.08 samples/sec Loss 4.8677 LearningRate 0.0305 Epoch: 8 Global Step: 111300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:48,980-Speed 3305.55 samples/sec Loss 4.8813 LearningRate 0.0305 Epoch: 8 Global Step: 111310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:52,047-Speed 3339.71 samples/sec Loss 4.7510 LearningRate 0.0305 Epoch: 8 Global Step: 111320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:55,202-Speed 3246.32 samples/sec Loss 4.8894 LearningRate 0.0305 Epoch: 8 Global Step: 111330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:20:58,270-Speed 3338.65 samples/sec Loss 4.8954 LearningRate 0.0304 Epoch: 8 Global Step: 111340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:21:01,349-Speed 3328.83 samples/sec Loss 4.9992 LearningRate 0.0304 Epoch: 8 Global Step: 111350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:21:04,454-Speed 3299.40 samples/sec Loss 4.8144 LearningRate 0.0304 Epoch: 8 Global Step: 111360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:21:07,557-Speed 3301.40 samples/sec Loss 4.7974 LearningRate 0.0304 Epoch: 8 Global Step: 111370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:21:10,609-Speed 3355.66 samples/sec Loss 4.8407 LearningRate 0.0304 Epoch: 8 Global Step: 111380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:21:13,737-Speed 3274.79 samples/sec Loss 4.9411 LearningRate 0.0304 Epoch: 8 Global Step: 111390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:21:16,831-Speed 3311.19 samples/sec Loss 4.8349 LearningRate 0.0304 Epoch: 8 Global Step: 111400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:21:19,913-Speed 3322.88 samples/sec Loss 4.8716 LearningRate 0.0304 Epoch: 8 Global Step: 111410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:21:22,978-Speed 3342.02 samples/sec Loss 4.8353 LearningRate 0.0304 Epoch: 8 Global Step: 111420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:21:26,091-Speed 3291.18 samples/sec Loss 4.8993 LearningRate 0.0304 Epoch: 8 Global Step: 111430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:21:29,176-Speed 3319.84 samples/sec Loss 4.7774 LearningRate 0.0304 Epoch: 8 Global Step: 111440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:21:32,302-Speed 3277.25 samples/sec Loss 4.8774 LearningRate 0.0304 Epoch: 8 Global Step: 111450 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:21:35,441-Speed 3263.36 samples/sec Loss 4.9025 LearningRate 0.0304 Epoch: 8 Global Step: 111460 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:21:38,542-Speed 3303.41 samples/sec Loss 4.9497 LearningRate 0.0304 Epoch: 8 Global Step: 111470 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:21:41,678-Speed 3266.04 samples/sec Loss 4.9077 LearningRate 0.0304 Epoch: 8 Global Step: 111480 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:21:44,748-Speed 3336.20 samples/sec Loss 4.8924 LearningRate 0.0304 Epoch: 8 Global Step: 111490 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:21:47,810-Speed 3345.56 samples/sec Loss 4.8325 LearningRate 0.0304 Epoch: 8 Global Step: 111500 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:21:50,893-Speed 3322.24 samples/sec Loss 4.9085 LearningRate 0.0304 Epoch: 8 Global Step: 111510 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:21:54,011-Speed 3285.06 samples/sec Loss 4.8429 LearningRate 0.0304 Epoch: 8 Global Step: 111520 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:21:57,140-Speed 3273.40 samples/sec Loss 4.8908 LearningRate 0.0304 Epoch: 8 Global Step: 111530 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:22:00,226-Speed 3320.03 samples/sec Loss 4.7954 LearningRate 0.0304 Epoch: 8 Global Step: 111540 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:22:03,338-Speed 3290.58 samples/sec Loss 4.8789 LearningRate 0.0304 Epoch: 8 Global Step: 111550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:06,421-Speed 3323.13 samples/sec Loss 4.8727 LearningRate 0.0304 Epoch: 8 Global Step: 111560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:09,535-Speed 3289.28 samples/sec Loss 4.8084 LearningRate 0.0303 Epoch: 8 Global Step: 111570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:12,658-Speed 3279.38 samples/sec Loss 4.8583 LearningRate 0.0303 Epoch: 8 Global Step: 111580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:15,757-Speed 3305.66 samples/sec Loss 4.9671 LearningRate 0.0303 Epoch: 8 Global Step: 111590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:18,857-Speed 3304.30 samples/sec Loss 4.7442 LearningRate 0.0303 Epoch: 8 Global Step: 111600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:21,961-Speed 3299.54 samples/sec Loss 4.8057 LearningRate 0.0303 Epoch: 8 Global Step: 111610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:25,060-Speed 3305.52 samples/sec Loss 4.7932 LearningRate 0.0303 Epoch: 8 Global Step: 111620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:28,178-Speed 3284.77 samples/sec Loss 4.7099 LearningRate 0.0303 Epoch: 8 Global Step: 111630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:31,299-Speed 3282.77 samples/sec Loss 4.9246 LearningRate 0.0303 Epoch: 8 Global Step: 111640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:34,417-Speed 3284.96 samples/sec Loss 4.9088 LearningRate 0.0303 Epoch: 8 Global Step: 111650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:22:37,526-Speed 3294.99 samples/sec Loss 4.8242 LearningRate 0.0303 Epoch: 8 Global Step: 111660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:40,614-Speed 3317.17 samples/sec Loss 4.9965 LearningRate 0.0303 Epoch: 8 Global Step: 111670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:43,706-Speed 3313.11 samples/sec Loss 4.7630 LearningRate 0.0303 Epoch: 8 Global Step: 111680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:46,818-Speed 3291.20 samples/sec Loss 4.8283 LearningRate 0.0303 Epoch: 8 Global Step: 111690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:49,907-Speed 3315.51 samples/sec Loss 4.8525 LearningRate 0.0303 Epoch: 8 Global Step: 111700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:53,019-Speed 3291.32 samples/sec Loss 4.8473 LearningRate 0.0303 Epoch: 8 Global Step: 111710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:56,110-Speed 3314.15 samples/sec Loss 4.8267 LearningRate 0.0303 Epoch: 8 Global Step: 111720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:22:59,237-Speed 3276.42 samples/sec Loss 4.8429 LearningRate 0.0303 Epoch: 8 Global Step: 111730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:23:02,411-Speed 3226.13 samples/sec Loss 4.8057 LearningRate 0.0303 Epoch: 8 Global Step: 111740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:23:05,554-Speed 3259.97 samples/sec Loss 4.7319 LearningRate 0.0303 Epoch: 8 Global Step: 111750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:23:08,656-Speed 3301.49 samples/sec Loss 4.8529 LearningRate 0.0303 Epoch: 8 Global Step: 111760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:23:11,740-Speed 3321.44 samples/sec Loss 4.7930 LearningRate 0.0303 Epoch: 8 Global Step: 111770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:23:15,044-Speed 3100.51 samples/sec Loss 4.9289 LearningRate 0.0303 Epoch: 8 Global Step: 111780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:23:46,372-Speed 326.87 samples/sec Loss 4.7892 LearningRate 0.0302 Epoch: 9 Global Step: 111790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:23:49,871-Speed 2927.71 samples/sec Loss 3.5898 LearningRate 0.0302 Epoch: 9 Global Step: 111800 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:23:53,055-Speed 3218.04 samples/sec Loss 3.6624 LearningRate 0.0302 Epoch: 9 Global Step: 111810 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:23:56,142-Speed 3318.83 samples/sec Loss 3.5538 LearningRate 0.0302 Epoch: 9 Global Step: 111820 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:23:59,229-Speed 3317.87 samples/sec Loss 3.6540 LearningRate 0.0302 Epoch: 9 Global Step: 111830 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:02,381-Speed 3250.21 samples/sec Loss 3.6090 LearningRate 0.0302 Epoch: 9 Global Step: 111840 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:05,715-Speed 3071.89 samples/sec Loss 3.7103 LearningRate 0.0302 Epoch: 9 Global Step: 111850 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:08,814-Speed 3305.39 samples/sec Loss 3.6201 LearningRate 0.0302 Epoch: 9 Global Step: 111860 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:12,053-Speed 3162.97 samples/sec Loss 3.4516 LearningRate 0.0302 Epoch: 9 Global Step: 111870 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:15,227-Speed 3226.78 samples/sec Loss 3.6424 LearningRate 0.0302 Epoch: 9 Global Step: 111880 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:18,380-Speed 3248.91 samples/sec Loss 3.6586 LearningRate 0.0302 Epoch: 9 Global Step: 111890 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:21,449-Speed 3337.26 samples/sec Loss 3.6319 LearningRate 0.0302 Epoch: 9 Global Step: 111900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:24:24,610-Speed 3241.14 samples/sec Loss 3.6270 LearningRate 0.0302 Epoch: 9 Global Step: 111910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:24:27,692-Speed 3323.18 samples/sec Loss 3.5789 LearningRate 0.0302 Epoch: 9 Global Step: 111920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:24:30,781-Speed 3316.81 samples/sec Loss 3.6072 LearningRate 0.0302 Epoch: 9 Global Step: 111930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:24:33,888-Speed 3296.27 samples/sec Loss 3.4913 LearningRate 0.0302 Epoch: 9 Global Step: 111940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:24:37,001-Speed 3289.95 samples/sec Loss 3.6041 LearningRate 0.0302 Epoch: 9 Global Step: 111950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:24:40,082-Speed 3324.95 samples/sec Loss 3.5902 LearningRate 0.0302 Epoch: 9 Global Step: 111960 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:43,224-Speed 3260.18 samples/sec Loss 3.5982 LearningRate 0.0302 Epoch: 9 Global Step: 111970 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:46,296-Speed 3334.78 samples/sec Loss 3.6634 LearningRate 0.0302 Epoch: 9 Global Step: 111980 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:49,505-Speed 3191.87 samples/sec Loss 3.5467 LearningRate 0.0302 Epoch: 9 Global Step: 111990 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:52,570-Speed 3341.37 samples/sec Loss 3.5299 LearningRate 0.0302 Epoch: 9 Global Step: 112000 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:55,699-Speed 3273.67 samples/sec Loss 3.6116 LearningRate 0.0302 Epoch: 9 Global Step: 112010 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:24:58,809-Speed 3294.24 samples/sec Loss 3.6673 LearningRate 0.0301 Epoch: 9 Global Step: 112020 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:25:01,942-Speed 3268.90 samples/sec Loss 3.5575 LearningRate 0.0301 Epoch: 9 Global Step: 112030 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:25:05,065-Speed 3279.56 samples/sec Loss 3.7011 LearningRate 0.0301 Epoch: 9 Global Step: 112040 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:25:08,146-Speed 3325.06 samples/sec Loss 3.5918 LearningRate 0.0301 Epoch: 9 Global Step: 112050 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:25:11,248-Speed 3302.28 samples/sec Loss 3.6412 LearningRate 0.0301 Epoch: 9 Global Step: 112060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:25:14,398-Speed 3252.29 samples/sec Loss 3.6569 LearningRate 0.0301 Epoch: 9 Global Step: 112070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:25:17,469-Speed 3335.18 samples/sec Loss 3.6177 LearningRate 0.0301 Epoch: 9 Global Step: 112080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:25:20,530-Speed 3345.73 samples/sec Loss 3.7815 LearningRate 0.0301 Epoch: 9 Global Step: 112090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:25:23,604-Speed 3332.46 samples/sec Loss 3.7385 LearningRate 0.0301 Epoch: 9 Global Step: 112100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:25:26,681-Speed 3329.49 samples/sec Loss 3.6278 LearningRate 0.0301 Epoch: 9 Global Step: 112110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:25:29,801-Speed 3282.80 samples/sec Loss 3.6212 LearningRate 0.0301 Epoch: 9 Global Step: 112120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:25:32,913-Speed 3293.90 samples/sec Loss 3.6330 LearningRate 0.0301 Epoch: 9 Global Step: 112130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:25:36,035-Speed 3281.30 samples/sec Loss 3.7392 LearningRate 0.0301 Epoch: 9 Global Step: 112140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:25:39,138-Speed 3301.07 samples/sec Loss 3.5948 LearningRate 0.0301 Epoch: 9 Global Step: 112150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:25:42,276-Speed 3263.92 samples/sec Loss 3.7531 LearningRate 0.0301 Epoch: 9 Global Step: 112160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:25:45,338-Speed 3345.26 samples/sec Loss 3.6484 LearningRate 0.0301 Epoch: 9 Global Step: 112170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:25:48,367-Speed 3382.25 samples/sec Loss 3.7776 LearningRate 0.0301 Epoch: 9 Global Step: 112180 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:25:51,487-Speed 3282.99 samples/sec Loss 3.6300 LearningRate 0.0301 Epoch: 9 Global Step: 112190 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:25:54,636-Speed 3253.12 samples/sec Loss 3.6549 LearningRate 0.0301 Epoch: 9 Global Step: 112200 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:25:57,690-Speed 3354.51 samples/sec Loss 3.7363 LearningRate 0.0301 Epoch: 9 Global Step: 112210 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:26:00,797-Speed 3296.28 samples/sec Loss 3.8035 LearningRate 0.0301 Epoch: 9 Global Step: 112220 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:26:03,868-Speed 3335.76 samples/sec Loss 3.7230 LearningRate 0.0301 Epoch: 9 Global Step: 112230 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:26:06,954-Speed 3319.04 samples/sec Loss 3.7054 LearningRate 0.0301 Epoch: 9 Global Step: 112240 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:26:10,014-Speed 3347.31 samples/sec Loss 3.6187 LearningRate 0.0300 Epoch: 9 Global Step: 112250 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:26:13,141-Speed 3275.71 samples/sec Loss 3.6766 LearningRate 0.0300 Epoch: 9 Global Step: 112260 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:26:16,197-Speed 3352.37 samples/sec Loss 3.8018 LearningRate 0.0300 Epoch: 9 Global Step: 112270 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:26:19,246-Speed 3360.24 samples/sec Loss 3.7643 LearningRate 0.0300 Epoch: 9 Global Step: 112280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:26:22,301-Speed 3352.21 samples/sec Loss 3.6792 LearningRate 0.0300 Epoch: 9 Global Step: 112290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:26:25,405-Speed 3300.59 samples/sec Loss 3.7004 LearningRate 0.0300 Epoch: 9 Global Step: 112300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:26:28,576-Speed 3230.35 samples/sec Loss 3.7032 LearningRate 0.0300 Epoch: 9 Global Step: 112310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:26:31,707-Speed 3271.76 samples/sec Loss 3.6839 LearningRate 0.0300 Epoch: 9 Global Step: 112320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:26:34,784-Speed 3328.14 samples/sec Loss 3.6765 LearningRate 0.0300 Epoch: 9 Global Step: 112330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:26:37,861-Speed 3329.77 samples/sec Loss 3.7271 LearningRate 0.0300 Epoch: 9 Global Step: 112340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:26:40,971-Speed 3293.26 samples/sec Loss 3.6695 LearningRate 0.0300 Epoch: 9 Global Step: 112350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:26:44,046-Speed 3330.98 samples/sec Loss 3.7807 LearningRate 0.0300 Epoch: 9 Global Step: 112360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:26:47,149-Speed 3301.21 samples/sec Loss 3.7834 LearningRate 0.0300 Epoch: 9 Global Step: 112370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:26:50,255-Speed 3298.11 samples/sec Loss 3.6717 LearningRate 0.0300 Epoch: 9 Global Step: 112380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:26:53,333-Speed 3327.53 samples/sec Loss 3.7490 LearningRate 0.0300 Epoch: 9 Global Step: 112390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:26:56,422-Speed 3316.51 samples/sec Loss 3.7006 LearningRate 0.0300 Epoch: 9 Global Step: 112400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:26:59,483-Speed 3345.85 samples/sec Loss 3.7837 LearningRate 0.0300 Epoch: 9 Global Step: 112410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:27:02,639-Speed 3245.70 samples/sec Loss 3.7294 LearningRate 0.0300 Epoch: 9 Global Step: 112420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:27:05,802-Speed 3238.42 samples/sec Loss 3.7013 LearningRate 0.0300 Epoch: 9 Global Step: 112430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:27:08,911-Speed 3295.43 samples/sec Loss 3.7547 LearningRate 0.0300 Epoch: 9 Global Step: 112440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:27:12,009-Speed 3305.90 samples/sec Loss 3.8122 LearningRate 0.0300 Epoch: 9 Global Step: 112450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:27:15,181-Speed 3229.25 samples/sec Loss 3.7609 LearningRate 0.0300 Epoch: 9 Global Step: 112460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:27:18,364-Speed 3218.19 samples/sec Loss 3.8248 LearningRate 0.0299 Epoch: 9 Global Step: 112470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:27:21,438-Speed 3332.64 samples/sec Loss 3.7349 LearningRate 0.0299 Epoch: 9 Global Step: 112480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:27:24,507-Speed 3338.03 samples/sec Loss 3.8082 LearningRate 0.0299 Epoch: 9 Global Step: 112490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:27:27,615-Speed 3295.64 samples/sec Loss 3.7977 LearningRate 0.0299 Epoch: 9 Global Step: 112500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:27:30,665-Speed 3357.98 samples/sec Loss 3.7189 LearningRate 0.0299 Epoch: 9 Global Step: 112510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:27:33,786-Speed 3282.64 samples/sec Loss 3.7214 LearningRate 0.0299 Epoch: 9 Global Step: 112520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:27:36,943-Speed 3244.44 samples/sec Loss 3.7197 LearningRate 0.0299 Epoch: 9 Global Step: 112530 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:27:40,090-Speed 3254.27 samples/sec Loss 3.8397 LearningRate 0.0299 Epoch: 9 Global Step: 112540 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:27:43,204-Speed 3290.40 samples/sec Loss 3.8423 LearningRate 0.0299 Epoch: 9 Global Step: 112550 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:27:46,325-Speed 3281.83 samples/sec Loss 3.8114 LearningRate 0.0299 Epoch: 9 Global Step: 112560 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:27:49,416-Speed 3313.58 samples/sec Loss 3.7632 LearningRate 0.0299 Epoch: 9 Global Step: 112570 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:27:52,588-Speed 3229.48 samples/sec Loss 3.8106 LearningRate 0.0299 Epoch: 9 Global Step: 112580 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:27:55,671-Speed 3322.27 samples/sec Loss 3.7790 LearningRate 0.0299 Epoch: 9 Global Step: 112590 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:27:58,732-Speed 3346.64 samples/sec Loss 3.7684 LearningRate 0.0299 Epoch: 9 Global Step: 112600 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:28:01,792-Speed 3347.23 samples/sec Loss 3.7950 LearningRate 0.0299 Epoch: 9 Global Step: 112610 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:28:04,846-Speed 3354.29 samples/sec Loss 3.7842 LearningRate 0.0299 Epoch: 9 Global Step: 112620 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:28:07,909-Speed 3344.85 samples/sec Loss 3.7781 LearningRate 0.0299 Epoch: 9 Global Step: 112630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:28:10,982-Speed 3332.85 samples/sec Loss 3.7829 LearningRate 0.0299 Epoch: 9 Global Step: 112640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:28:14,146-Speed 3236.99 samples/sec Loss 3.6777 LearningRate 0.0299 Epoch: 9 Global Step: 112650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:28:17,283-Speed 3265.14 samples/sec Loss 3.7862 LearningRate 0.0299 Epoch: 9 Global Step: 112660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:28:20,385-Speed 3302.72 samples/sec Loss 3.8396 LearningRate 0.0299 Epoch: 9 Global Step: 112670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:28:23,500-Speed 3288.24 samples/sec Loss 3.8416 LearningRate 0.0299 Epoch: 9 Global Step: 112680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:28:26,637-Speed 3265.11 samples/sec Loss 3.7523 LearningRate 0.0299 Epoch: 9 Global Step: 112690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:28:29,811-Speed 3227.77 samples/sec Loss 3.7861 LearningRate 0.0298 Epoch: 9 Global Step: 112700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:28:32,868-Speed 3350.10 samples/sec Loss 3.8585 LearningRate 0.0298 Epoch: 9 Global Step: 112710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:28:35,929-Speed 3347.01 samples/sec Loss 3.7941 LearningRate 0.0298 Epoch: 9 Global Step: 112720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:28:39,065-Speed 3266.01 samples/sec Loss 3.8224 LearningRate 0.0298 Epoch: 9 Global Step: 112730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:28:42,199-Speed 3268.52 samples/sec Loss 3.8599 LearningRate 0.0298 Epoch: 9 Global Step: 112740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:28:45,265-Speed 3340.26 samples/sec Loss 3.8355 LearningRate 0.0298 Epoch: 9 Global Step: 112750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:28:48,377-Speed 3291.76 samples/sec Loss 3.8869 LearningRate 0.0298 Epoch: 9 Global Step: 112760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:28:51,461-Speed 3322.01 samples/sec Loss 3.8643 LearningRate 0.0298 Epoch: 9 Global Step: 112770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:28:54,529-Speed 3338.90 samples/sec Loss 3.7841 LearningRate 0.0298 Epoch: 9 Global Step: 112780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:28:57,620-Speed 3314.06 samples/sec Loss 3.8220 LearningRate 0.0298 Epoch: 9 Global Step: 112790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:29:00,767-Speed 3254.93 samples/sec Loss 3.8389 LearningRate 0.0298 Epoch: 9 Global Step: 112800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:29:03,900-Speed 3269.38 samples/sec Loss 3.8847 LearningRate 0.0298 Epoch: 9 Global Step: 112810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:29:07,038-Speed 3264.49 samples/sec Loss 3.8086 LearningRate 0.0298 Epoch: 9 Global Step: 112820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:29:10,108-Speed 3336.04 samples/sec Loss 3.7660 LearningRate 0.0298 Epoch: 9 Global Step: 112830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:29:13,205-Speed 3306.92 samples/sec Loss 3.7873 LearningRate 0.0298 Epoch: 9 Global Step: 112840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:29:16,266-Speed 3346.85 samples/sec Loss 3.8721 LearningRate 0.0298 Epoch: 9 Global Step: 112850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:29:19,364-Speed 3306.71 samples/sec Loss 3.8349 LearningRate 0.0298 Epoch: 9 Global Step: 112860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:29:22,415-Speed 3357.32 samples/sec Loss 3.8197 LearningRate 0.0298 Epoch: 9 Global Step: 112870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:29:25,521-Speed 3297.61 samples/sec Loss 3.8601 LearningRate 0.0298 Epoch: 9 Global Step: 112880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:29:28,700-Speed 3221.79 samples/sec Loss 3.8202 LearningRate 0.0298 Epoch: 9 Global Step: 112890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:29:31,804-Speed 3300.59 samples/sec Loss 3.7228 LearningRate 0.0298 Epoch: 9 Global Step: 112900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:29:34,890-Speed 3319.10 samples/sec Loss 3.8872 LearningRate 0.0298 Epoch: 9 Global Step: 112910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:29:37,982-Speed 3313.20 samples/sec Loss 3.9478 LearningRate 0.0298 Epoch: 9 Global Step: 112920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:29:41,066-Speed 3320.40 samples/sec Loss 3.8567 LearningRate 0.0297 Epoch: 9 Global Step: 112930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:29:44,155-Speed 3316.28 samples/sec Loss 3.8311 LearningRate 0.0297 Epoch: 9 Global Step: 112940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:29:47,242-Speed 3318.74 samples/sec Loss 3.8875 LearningRate 0.0297 Epoch: 9 Global Step: 112950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:29:50,303-Speed 3346.30 samples/sec Loss 3.8188 LearningRate 0.0297 Epoch: 9 Global Step: 112960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:29:53,525-Speed 3178.58 samples/sec Loss 3.9297 LearningRate 0.0297 Epoch: 9 Global Step: 112970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:29:56,618-Speed 3312.13 samples/sec Loss 3.9085 LearningRate 0.0297 Epoch: 9 Global Step: 112980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 11:29:59,663-Speed 3364.58 samples/sec Loss 3.8366 LearningRate 0.0297 Epoch: 9 Global Step: 112990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:30:02,808-Speed 3256.67 samples/sec Loss 3.8092 LearningRate 0.0297 Epoch: 9 Global Step: 113000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:30:05,870-Speed 3345.79 samples/sec Loss 3.8690 LearningRate 0.0297 Epoch: 9 Global Step: 113010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:08,941-Speed 3334.62 samples/sec Loss 3.9181 LearningRate 0.0297 Epoch: 9 Global Step: 113020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:12,006-Speed 3341.70 samples/sec Loss 3.8884 LearningRate 0.0297 Epoch: 9 Global Step: 113030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:15,089-Speed 3322.57 samples/sec Loss 3.8316 LearningRate 0.0297 Epoch: 9 Global Step: 113040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:18,200-Speed 3293.46 samples/sec Loss 3.9288 LearningRate 0.0297 Epoch: 9 Global Step: 113050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:21,260-Speed 3347.08 samples/sec Loss 3.8806 LearningRate 0.0297 Epoch: 9 Global Step: 113060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:24,330-Speed 3336.03 samples/sec Loss 3.8828 LearningRate 0.0297 Epoch: 9 Global Step: 113070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:27,386-Speed 3351.63 samples/sec Loss 3.8500 LearningRate 0.0297 Epoch: 9 Global Step: 113080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:30,524-Speed 3264.24 samples/sec Loss 3.8493 LearningRate 0.0297 Epoch: 9 Global Step: 113090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:33,606-Speed 3323.80 samples/sec Loss 3.9139 LearningRate 0.0297 Epoch: 9 Global Step: 113100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:36,675-Speed 3338.31 samples/sec Loss 3.9044 LearningRate 0.0297 Epoch: 9 Global Step: 113110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:30:39,775-Speed 3303.29 samples/sec Loss 3.8866 LearningRate 0.0297 Epoch: 9 Global Step: 113120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:30:42,841-Speed 3341.12 samples/sec Loss 3.8405 LearningRate 0.0297 Epoch: 9 Global Step: 113130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:30:45,876-Speed 3374.92 samples/sec Loss 3.8983 LearningRate 0.0297 Epoch: 9 Global Step: 113140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:48,983-Speed 3296.51 samples/sec Loss 3.8650 LearningRate 0.0297 Epoch: 9 Global Step: 113150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:52,187-Speed 3197.04 samples/sec Loss 3.8462 LearningRate 0.0296 Epoch: 9 Global Step: 113160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:55,295-Speed 3296.19 samples/sec Loss 3.8440 LearningRate 0.0296 Epoch: 9 Global Step: 113170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:30:58,401-Speed 3297.61 samples/sec Loss 3.8564 LearningRate 0.0296 Epoch: 9 Global Step: 113180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:01,492-Speed 3313.81 samples/sec Loss 3.7704 LearningRate 0.0296 Epoch: 9 Global Step: 113190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:04,571-Speed 3327.34 samples/sec Loss 3.9235 LearningRate 0.0296 Epoch: 9 Global Step: 113200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:07,697-Speed 3276.78 samples/sec Loss 3.9881 LearningRate 0.0296 Epoch: 9 Global Step: 113210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:10,764-Speed 3340.04 samples/sec Loss 3.9455 LearningRate 0.0296 Epoch: 9 Global Step: 113220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:13,877-Speed 3290.03 samples/sec Loss 3.7813 LearningRate 0.0296 Epoch: 9 Global Step: 113230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:16,991-Speed 3288.93 samples/sec Loss 3.8930 LearningRate 0.0296 Epoch: 9 Global Step: 113240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:31:20,108-Speed 3286.36 samples/sec Loss 3.8944 LearningRate 0.0296 Epoch: 9 Global Step: 113250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:31:23,222-Speed 3289.43 samples/sec Loss 3.9409 LearningRate 0.0296 Epoch: 9 Global Step: 113260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:31:26,278-Speed 3352.34 samples/sec Loss 3.9928 LearningRate 0.0296 Epoch: 9 Global Step: 113270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:29,356-Speed 3328.35 samples/sec Loss 3.8617 LearningRate 0.0296 Epoch: 9 Global Step: 113280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:32,500-Speed 3257.32 samples/sec Loss 3.9703 LearningRate 0.0296 Epoch: 9 Global Step: 113290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:35,601-Speed 3304.06 samples/sec Loss 3.9345 LearningRate 0.0296 Epoch: 9 Global Step: 113300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:38,742-Speed 3260.27 samples/sec Loss 3.9766 LearningRate 0.0296 Epoch: 9 Global Step: 113310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:41,855-Speed 3291.07 samples/sec Loss 3.8621 LearningRate 0.0296 Epoch: 9 Global Step: 113320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:44,910-Speed 3352.23 samples/sec Loss 3.8696 LearningRate 0.0296 Epoch: 9 Global Step: 113330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:48,000-Speed 3316.31 samples/sec Loss 3.8800 LearningRate 0.0296 Epoch: 9 Global Step: 113340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:51,082-Speed 3323.44 samples/sec Loss 3.8938 LearningRate 0.0296 Epoch: 9 Global Step: 113350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:54,223-Speed 3260.95 samples/sec Loss 3.9559 LearningRate 0.0296 Epoch: 9 Global Step: 113360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:31:57,317-Speed 3309.79 samples/sec Loss 3.8660 LearningRate 0.0296 Epoch: 9 Global Step: 113370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:32:00,425-Speed 3296.48 samples/sec Loss 3.9238 LearningRate 0.0295 Epoch: 9 Global Step: 113380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:32:03,596-Speed 3229.61 samples/sec Loss 3.7605 LearningRate 0.0295 Epoch: 9 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:32:06,661-Speed 3342.42 samples/sec Loss 3.8405 LearningRate 0.0295 Epoch: 9 Global Step: 113400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:09,762-Speed 3303.60 samples/sec Loss 3.8419 LearningRate 0.0295 Epoch: 9 Global Step: 113410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:12,850-Speed 3317.33 samples/sec Loss 3.8581 LearningRate 0.0295 Epoch: 9 Global Step: 113420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:15,937-Speed 3318.40 samples/sec Loss 3.9257 LearningRate 0.0295 Epoch: 9 Global Step: 113430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:19,053-Speed 3287.10 samples/sec Loss 3.9270 LearningRate 0.0295 Epoch: 9 Global Step: 113440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:22,131-Speed 3328.46 samples/sec Loss 3.8921 LearningRate 0.0295 Epoch: 9 Global Step: 113450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:25,252-Speed 3281.90 samples/sec Loss 3.8545 LearningRate 0.0295 Epoch: 9 Global Step: 113460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:28,309-Speed 3350.80 samples/sec Loss 3.8389 LearningRate 0.0295 Epoch: 9 Global Step: 113470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:31,395-Speed 3319.45 samples/sec Loss 4.0339 LearningRate 0.0295 Epoch: 9 Global Step: 113480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:34,466-Speed 3335.28 samples/sec Loss 3.9416 LearningRate 0.0295 Epoch: 9 Global Step: 113490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:37,641-Speed 3226.10 samples/sec Loss 4.0512 LearningRate 0.0295 Epoch: 9 Global Step: 113500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:32:40,760-Speed 3283.77 samples/sec Loss 3.9086 LearningRate 0.0295 Epoch: 9 Global Step: 113510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:43,919-Speed 3242.64 samples/sec Loss 3.9838 LearningRate 0.0295 Epoch: 9 Global Step: 113520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:47,026-Speed 3297.81 samples/sec Loss 3.9209 LearningRate 0.0295 Epoch: 9 Global Step: 113530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:50,086-Speed 3346.35 samples/sec Loss 3.9170 LearningRate 0.0295 Epoch: 9 Global Step: 113540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:53,146-Speed 3348.34 samples/sec Loss 3.8735 LearningRate 0.0295 Epoch: 9 Global Step: 113550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:56,241-Speed 3309.60 samples/sec Loss 3.8640 LearningRate 0.0295 Epoch: 9 Global Step: 113560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:32:59,349-Speed 3295.58 samples/sec Loss 3.9839 LearningRate 0.0295 Epoch: 9 Global Step: 113570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:33:02,434-Speed 3319.97 samples/sec Loss 3.9637 LearningRate 0.0295 Epoch: 9 Global Step: 113580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:33:05,493-Speed 3348.31 samples/sec Loss 3.9469 LearningRate 0.0295 Epoch: 9 Global Step: 113590 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:33:08,557-Speed 3343.22 samples/sec Loss 3.9493 LearningRate 0.0295 Epoch: 9 Global Step: 113600 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:33:11,608-Speed 3357.79 samples/sec Loss 3.9040 LearningRate 0.0294 Epoch: 9 Global Step: 113610 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:33:14,671-Speed 3344.85 samples/sec Loss 3.9920 LearningRate 0.0294 Epoch: 9 Global Step: 113620 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:33:17,803-Speed 3270.71 samples/sec Loss 3.9555 LearningRate 0.0294 Epoch: 9 Global Step: 113630 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:33:20,879-Speed 3329.56 samples/sec Loss 3.9185 LearningRate 0.0294 Epoch: 9 Global Step: 113640 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:33:23,985-Speed 3297.30 samples/sec Loss 3.9254 LearningRate 0.0294 Epoch: 9 Global Step: 113650 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:33:27,048-Speed 3344.35 samples/sec Loss 4.0611 LearningRate 0.0294 Epoch: 9 Global Step: 113660 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:33:30,142-Speed 3311.26 samples/sec Loss 3.9068 LearningRate 0.0294 Epoch: 9 Global Step: 113670 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:33:33,325-Speed 3217.89 samples/sec Loss 4.0222 LearningRate 0.0294 Epoch: 9 Global Step: 113680 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:33:36,508-Speed 3218.39 samples/sec Loss 3.9687 LearningRate 0.0294 Epoch: 9 Global Step: 113690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:33:39,652-Speed 3257.28 samples/sec Loss 4.0257 LearningRate 0.0294 Epoch: 9 Global Step: 113700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:33:42,803-Speed 3251.16 samples/sec Loss 4.0060 LearningRate 0.0294 Epoch: 9 Global Step: 113710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:33:45,883-Speed 3325.64 samples/sec Loss 3.9458 LearningRate 0.0294 Epoch: 9 Global Step: 113720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:33:49,005-Speed 3281.09 samples/sec Loss 4.0383 LearningRate 0.0294 Epoch: 9 Global Step: 113730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:33:52,127-Speed 3281.08 samples/sec Loss 3.9265 LearningRate 0.0294 Epoch: 9 Global Step: 113740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:33:55,293-Speed 3235.17 samples/sec Loss 3.9944 LearningRate 0.0294 Epoch: 9 Global Step: 113750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:33:58,445-Speed 3250.25 samples/sec Loss 3.9926 LearningRate 0.0294 Epoch: 9 Global Step: 113760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:34:01,536-Speed 3314.24 samples/sec Loss 4.0160 LearningRate 0.0294 Epoch: 9 Global Step: 113770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:34:04,629-Speed 3312.02 samples/sec Loss 4.0209 LearningRate 0.0294 Epoch: 9 Global Step: 113780 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:34:07,687-Speed 3349.12 samples/sec Loss 3.9741 LearningRate 0.0294 Epoch: 9 Global Step: 113790 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:34:10,763-Speed 3329.99 samples/sec Loss 3.9429 LearningRate 0.0294 Epoch: 9 Global Step: 113800 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:34:13,841-Speed 3328.17 samples/sec Loss 4.0533 LearningRate 0.0294 Epoch: 9 Global Step: 113810 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:34:16,895-Speed 3354.25 samples/sec Loss 3.9768 LearningRate 0.0294 Epoch: 9 Global Step: 113820 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:34:19,969-Speed 3332.50 samples/sec Loss 3.9895 LearningRate 0.0294 Epoch: 9 Global Step: 113830 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:34:23,073-Speed 3299.20 samples/sec Loss 3.9401 LearningRate 0.0293 Epoch: 9 Global Step: 113840 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:34:26,131-Speed 3350.34 samples/sec Loss 4.0228 LearningRate 0.0293 Epoch: 9 Global Step: 113850 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:34:29,192-Speed 3346.51 samples/sec Loss 4.0404 LearningRate 0.0293 Epoch: 9 Global Step: 113860 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:34:32,296-Speed 3299.91 samples/sec Loss 3.9355 LearningRate 0.0293 Epoch: 9 Global Step: 113870 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:34:35,407-Speed 3292.30 samples/sec Loss 3.9685 LearningRate 0.0293 Epoch: 9 Global Step: 113880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:34:38,518-Speed 3292.58 samples/sec Loss 4.0563 LearningRate 0.0293 Epoch: 9 Global Step: 113890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:34:41,642-Speed 3279.62 samples/sec Loss 4.0598 LearningRate 0.0293 Epoch: 9 Global Step: 113900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:34:44,706-Speed 3342.19 samples/sec Loss 3.9899 LearningRate 0.0293 Epoch: 9 Global Step: 113910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:34:47,820-Speed 3290.54 samples/sec Loss 3.9815 LearningRate 0.0293 Epoch: 9 Global Step: 113920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:34:50,877-Speed 3350.78 samples/sec Loss 3.9854 LearningRate 0.0293 Epoch: 9 Global Step: 113930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:34:53,934-Speed 3350.28 samples/sec Loss 4.0077 LearningRate 0.0293 Epoch: 9 Global Step: 113940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:34:57,025-Speed 3313.39 samples/sec Loss 4.1063 LearningRate 0.0293 Epoch: 9 Global Step: 113950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:35:00,135-Speed 3294.31 samples/sec Loss 4.0154 LearningRate 0.0293 Epoch: 9 Global Step: 113960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:35:03,270-Speed 3267.40 samples/sec Loss 3.9971 LearningRate 0.0293 Epoch: 9 Global Step: 113970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:35:06,360-Speed 3314.84 samples/sec Loss 4.0305 LearningRate 0.0293 Epoch: 9 Global Step: 113980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:35:09,495-Speed 3267.78 samples/sec Loss 3.9483 LearningRate 0.0293 Epoch: 9 Global Step: 113990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:35:12,645-Speed 3252.29 samples/sec Loss 4.0654 LearningRate 0.0293 Epoch: 9 Global Step: 114000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:35:15,769-Speed 3278.43 samples/sec Loss 3.9901 LearningRate 0.0293 Epoch: 9 Global Step: 114010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:35:18,895-Speed 3277.78 samples/sec Loss 4.0155 LearningRate 0.0293 Epoch: 9 Global Step: 114020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:35:21,989-Speed 3310.38 samples/sec Loss 3.9769 LearningRate 0.0293 Epoch: 9 Global Step: 114030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:35:25,140-Speed 3250.35 samples/sec Loss 4.0800 LearningRate 0.0293 Epoch: 9 Global Step: 114040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:35:28,287-Speed 3255.03 samples/sec Loss 4.0904 LearningRate 0.0293 Epoch: 9 Global Step: 114050 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:35:31,382-Speed 3309.28 samples/sec Loss 4.0671 LearningRate 0.0293 Epoch: 9 Global Step: 114060 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:35:34,512-Speed 3273.01 samples/sec Loss 3.9431 LearningRate 0.0292 Epoch: 9 Global Step: 114070 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:35:37,618-Speed 3297.71 samples/sec Loss 4.0622 LearningRate 0.0292 Epoch: 9 Global Step: 114080 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:35:40,768-Speed 3251.54 samples/sec Loss 4.0271 LearningRate 0.0292 Epoch: 9 Global Step: 114090 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:35:43,906-Speed 3264.59 samples/sec Loss 3.9867 LearningRate 0.0292 Epoch: 9 Global Step: 114100 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:35:47,018-Speed 3292.06 samples/sec Loss 4.0194 LearningRate 0.0292 Epoch: 9 Global Step: 114110 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:35:50,070-Speed 3356.11 samples/sec Loss 4.1465 LearningRate 0.0292 Epoch: 9 Global Step: 114120 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:35:53,199-Speed 3272.98 samples/sec Loss 3.9596 LearningRate 0.0292 Epoch: 9 Global Step: 114130 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:35:56,313-Speed 3289.95 samples/sec Loss 4.1496 LearningRate 0.0292 Epoch: 9 Global Step: 114140 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:35:59,449-Speed 3265.89 samples/sec Loss 3.9898 LearningRate 0.0292 Epoch: 9 Global Step: 114150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:36:02,538-Speed 3316.05 samples/sec Loss 4.0145 LearningRate 0.0292 Epoch: 9 Global Step: 114160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:36:05,673-Speed 3267.99 samples/sec Loss 3.9915 LearningRate 0.0292 Epoch: 9 Global Step: 114170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:36:08,766-Speed 3311.62 samples/sec Loss 4.0957 LearningRate 0.0292 Epoch: 9 Global Step: 114180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:36:11,838-Speed 3334.36 samples/sec Loss 4.0103 LearningRate 0.0292 Epoch: 9 Global Step: 114190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:36:14,917-Speed 3326.06 samples/sec Loss 3.8924 LearningRate 0.0292 Epoch: 9 Global Step: 114200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:36:18,016-Speed 3305.47 samples/sec Loss 4.0395 LearningRate 0.0292 Epoch: 9 Global Step: 114210 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:36:21,111-Speed 3310.06 samples/sec Loss 4.0905 LearningRate 0.0292 Epoch: 9 Global Step: 114220 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:36:24,244-Speed 3269.27 samples/sec Loss 4.0240 LearningRate 0.0292 Epoch: 9 Global Step: 114230 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:36:27,333-Speed 3316.49 samples/sec Loss 4.1131 LearningRate 0.0292 Epoch: 9 Global Step: 114240 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:36:30,433-Speed 3303.36 samples/sec Loss 4.0463 LearningRate 0.0292 Epoch: 9 Global Step: 114250 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:36:33,510-Speed 3329.56 samples/sec Loss 4.1159 LearningRate 0.0292 Epoch: 9 Global Step: 114260 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:36:36,613-Speed 3300.83 samples/sec Loss 4.0112 LearningRate 0.0292 Epoch: 9 Global Step: 114270 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:36:39,696-Speed 3322.26 samples/sec Loss 4.0773 LearningRate 0.0292 Epoch: 9 Global Step: 114280 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:36:42,820-Speed 3278.92 samples/sec Loss 4.0333 LearningRate 0.0292 Epoch: 9 Global Step: 114290 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:36:45,920-Speed 3304.98 samples/sec Loss 4.0108 LearningRate 0.0291 Epoch: 9 Global Step: 114300 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:36:49,078-Speed 3243.53 samples/sec Loss 4.0857 LearningRate 0.0291 Epoch: 9 Global Step: 114310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:36:52,152-Speed 3331.90 samples/sec Loss 3.9855 LearningRate 0.0291 Epoch: 9 Global Step: 114320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:36:55,271-Speed 3283.95 samples/sec Loss 4.0746 LearningRate 0.0291 Epoch: 9 Global Step: 114330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:36:58,382-Speed 3293.37 samples/sec Loss 3.9154 LearningRate 0.0291 Epoch: 9 Global Step: 114340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:01,526-Speed 3257.61 samples/sec Loss 4.0774 LearningRate 0.0291 Epoch: 9 Global Step: 114350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:04,641-Speed 3288.73 samples/sec Loss 4.0483 LearningRate 0.0291 Epoch: 9 Global Step: 114360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:07,819-Speed 3223.43 samples/sec Loss 4.0607 LearningRate 0.0291 Epoch: 9 Global Step: 114370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:10,900-Speed 3324.06 samples/sec Loss 4.1186 LearningRate 0.0291 Epoch: 9 Global Step: 114380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:14,041-Speed 3261.58 samples/sec Loss 4.0654 LearningRate 0.0291 Epoch: 9 Global Step: 114390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:17,185-Speed 3257.79 samples/sec Loss 4.0656 LearningRate 0.0291 Epoch: 9 Global Step: 114400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:20,278-Speed 3311.80 samples/sec Loss 4.0828 LearningRate 0.0291 Epoch: 9 Global Step: 114410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:37:23,391-Speed 3291.16 samples/sec Loss 4.0873 LearningRate 0.0291 Epoch: 9 Global Step: 114420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:37:26,465-Speed 3332.45 samples/sec Loss 3.9653 LearningRate 0.0291 Epoch: 9 Global Step: 114430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:37:29,573-Speed 3295.93 samples/sec Loss 4.1416 LearningRate 0.0291 Epoch: 9 Global Step: 114440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 11:37:32,707-Speed 3268.05 samples/sec Loss 4.1213 LearningRate 0.0291 Epoch: 9 Global Step: 114450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:35,854-Speed 3255.23 samples/sec Loss 4.0638 LearningRate 0.0291 Epoch: 9 Global Step: 114460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:38,973-Speed 3283.99 samples/sec Loss 4.1820 LearningRate 0.0291 Epoch: 9 Global Step: 114470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:42,076-Speed 3301.66 samples/sec Loss 4.0627 LearningRate 0.0291 Epoch: 9 Global Step: 114480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:45,168-Speed 3311.97 samples/sec Loss 4.0595 LearningRate 0.0291 Epoch: 9 Global Step: 114490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:48,275-Speed 3296.98 samples/sec Loss 4.0480 LearningRate 0.0291 Epoch: 9 Global Step: 114500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:51,426-Speed 3250.93 samples/sec Loss 4.0604 LearningRate 0.0291 Epoch: 9 Global Step: 114510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:54,582-Speed 3245.23 samples/sec Loss 4.1037 LearningRate 0.0291 Epoch: 9 Global Step: 114520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:37:57,734-Speed 3250.22 samples/sec Loss 4.0049 LearningRate 0.0290 Epoch: 9 Global Step: 114530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:38:00,893-Speed 3243.16 samples/sec Loss 4.1733 LearningRate 0.0290 Epoch: 9 Global Step: 114540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:38:04,009-Speed 3286.99 samples/sec Loss 4.1959 LearningRate 0.0290 Epoch: 9 Global Step: 114550 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:38:07,181-Speed 3229.33 samples/sec Loss 4.0671 LearningRate 0.0290 Epoch: 9 Global Step: 114560 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:38:10,274-Speed 3311.47 samples/sec Loss 4.1674 LearningRate 0.0290 Epoch: 9 Global Step: 114570 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:38:13,401-Speed 3275.53 samples/sec Loss 4.1061 LearningRate 0.0290 Epoch: 9 Global Step: 114580 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:38:16,552-Speed 3251.03 samples/sec Loss 4.1106 LearningRate 0.0290 Epoch: 9 Global Step: 114590 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:38:19,681-Speed 3273.15 samples/sec Loss 4.1738 LearningRate 0.0290 Epoch: 9 Global Step: 114600 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:38:22,806-Speed 3277.89 samples/sec Loss 4.0693 LearningRate 0.0290 Epoch: 9 Global Step: 114610 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:38:25,963-Speed 3245.28 samples/sec Loss 4.0753 LearningRate 0.0290 Epoch: 9 Global Step: 114620 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:38:29,163-Speed 3201.17 samples/sec Loss 4.1141 LearningRate 0.0290 Epoch: 9 Global Step: 114630 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:38:32,274-Speed 3292.41 samples/sec Loss 4.0303 LearningRate 0.0290 Epoch: 9 Global Step: 114640 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 11:38:35,422-Speed 3254.03 samples/sec Loss 4.0717 LearningRate 0.0290 Epoch: 9 Global Step: 114650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:38:38,556-Speed 3268.52 samples/sec Loss 4.0331 LearningRate 0.0290 Epoch: 9 Global Step: 114660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:38:41,626-Speed 3336.17 samples/sec Loss 4.0633 LearningRate 0.0290 Epoch: 9 Global Step: 114670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:38:44,735-Speed 3294.38 samples/sec Loss 4.0847 LearningRate 0.0290 Epoch: 9 Global Step: 114680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:38:47,916-Speed 3220.65 samples/sec Loss 4.1029 LearningRate 0.0290 Epoch: 9 Global Step: 114690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:38:51,097-Speed 3219.42 samples/sec Loss 4.0763 LearningRate 0.0290 Epoch: 9 Global Step: 114700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:38:54,234-Speed 3265.93 samples/sec Loss 4.0857 LearningRate 0.0290 Epoch: 9 Global Step: 114710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:38:57,396-Speed 3239.20 samples/sec Loss 4.1542 LearningRate 0.0290 Epoch: 9 Global Step: 114720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 11:39:00,483-Speed 3318.23 samples/sec Loss 4.0154 LearningRate 0.0290 Epoch: 9 Global Step: 114730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:39:03,635-Speed 3250.41 samples/sec Loss 4.1553 LearningRate 0.0290 Epoch: 9 Global Step: 114740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:39:06,770-Speed 3267.37 samples/sec Loss 4.1497 LearningRate 0.0290 Epoch: 9 Global Step: 114750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:39:09,823-Speed 3354.18 samples/sec Loss 4.0142 LearningRate 0.0289 Epoch: 9 Global Step: 114760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:39:12,972-Speed 3253.57 samples/sec Loss 4.1288 LearningRate 0.0289 Epoch: 9 Global Step: 114770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:39:16,054-Speed 3322.91 samples/sec Loss 4.1517 LearningRate 0.0289 Epoch: 9 Global Step: 114780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:39:19,158-Speed 3300.61 samples/sec Loss 4.1979 LearningRate 0.0289 Epoch: 9 Global Step: 114790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:39:22,279-Speed 3282.18 samples/sec Loss 4.0498 LearningRate 0.0289 Epoch: 9 Global Step: 114800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:39:25,368-Speed 3315.37 samples/sec Loss 4.0577 LearningRate 0.0289 Epoch: 9 Global Step: 114810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:39:28,497-Speed 3273.87 samples/sec Loss 4.0605 LearningRate 0.0289 Epoch: 9 Global Step: 114820 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:39:31,651-Speed 3247.95 samples/sec Loss 4.1317 LearningRate 0.0289 Epoch: 9 Global Step: 114830 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:39:34,748-Speed 3307.27 samples/sec Loss 4.0031 LearningRate 0.0289 Epoch: 9 Global Step: 114840 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:39:37,868-Speed 3283.06 samples/sec Loss 4.1479 LearningRate 0.0289 Epoch: 9 Global Step: 114850 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:39:40,996-Speed 3275.05 samples/sec Loss 4.1102 LearningRate 0.0289 Epoch: 9 Global Step: 114860 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:39:44,089-Speed 3311.62 samples/sec Loss 4.1462 LearningRate 0.0289 Epoch: 9 Global Step: 114870 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:39:47,195-Speed 3297.50 samples/sec Loss 4.1397 LearningRate 0.0289 Epoch: 9 Global Step: 114880 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:39:50,269-Speed 3332.55 samples/sec Loss 4.1162 LearningRate 0.0289 Epoch: 9 Global Step: 114890 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:39:53,361-Speed 3312.74 samples/sec Loss 4.0956 LearningRate 0.0289 Epoch: 9 Global Step: 114900 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:39:56,444-Speed 3322.32 samples/sec Loss 4.0997 LearningRate 0.0289 Epoch: 9 Global Step: 114910 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:39:59,567-Speed 3280.15 samples/sec Loss 4.0967 LearningRate 0.0289 Epoch: 9 Global Step: 114920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:02,710-Speed 3259.67 samples/sec Loss 4.1908 LearningRate 0.0289 Epoch: 9 Global Step: 114930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:05,864-Speed 3247.06 samples/sec Loss 4.1049 LearningRate 0.0289 Epoch: 9 Global Step: 114940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:08,959-Speed 3310.19 samples/sec Loss 4.1062 LearningRate 0.0289 Epoch: 9 Global Step: 114950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:12,023-Speed 3343.12 samples/sec Loss 4.1074 LearningRate 0.0289 Epoch: 9 Global Step: 114960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:15,118-Speed 3309.22 samples/sec Loss 4.1477 LearningRate 0.0289 Epoch: 9 Global Step: 114970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:18,254-Speed 3266.06 samples/sec Loss 4.1293 LearningRate 0.0289 Epoch: 9 Global Step: 114980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:21,358-Speed 3300.11 samples/sec Loss 4.0796 LearningRate 0.0288 Epoch: 9 Global Step: 114990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:24,454-Speed 3308.98 samples/sec Loss 4.0733 LearningRate 0.0288 Epoch: 9 Global Step: 115000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:27,548-Speed 3310.73 samples/sec Loss 4.2012 LearningRate 0.0288 Epoch: 9 Global Step: 115010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:30,685-Speed 3264.58 samples/sec Loss 4.2116 LearningRate 0.0288 Epoch: 9 Global Step: 115020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:40:33,749-Speed 3342.73 samples/sec Loss 4.1884 LearningRate 0.0288 Epoch: 9 Global Step: 115030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:40:36,784-Speed 3376.47 samples/sec Loss 4.1906 LearningRate 0.0288 Epoch: 9 Global Step: 115040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:39,913-Speed 3272.78 samples/sec Loss 4.1497 LearningRate 0.0288 Epoch: 9 Global Step: 115050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:42,997-Speed 3322.07 samples/sec Loss 4.2372 LearningRate 0.0288 Epoch: 9 Global Step: 115060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:46,056-Speed 3348.78 samples/sec Loss 4.2180 LearningRate 0.0288 Epoch: 9 Global Step: 115070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:49,132-Speed 3330.06 samples/sec Loss 4.2014 LearningRate 0.0288 Epoch: 9 Global Step: 115080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:52,257-Speed 3277.52 samples/sec Loss 4.1466 LearningRate 0.0288 Epoch: 9 Global Step: 115090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:55,411-Speed 3248.16 samples/sec Loss 4.1873 LearningRate 0.0288 Epoch: 9 Global Step: 115100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:40:58,487-Speed 3329.93 samples/sec Loss 4.0276 LearningRate 0.0288 Epoch: 9 Global Step: 115110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:41:01,644-Speed 3244.60 samples/sec Loss 4.1244 LearningRate 0.0288 Epoch: 9 Global Step: 115120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:41:04,763-Speed 3283.52 samples/sec Loss 4.1623 LearningRate 0.0288 Epoch: 9 Global Step: 115130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:41:07,876-Speed 3291.28 samples/sec Loss 4.1383 LearningRate 0.0288 Epoch: 9 Global Step: 115140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:41:10,996-Speed 3282.61 samples/sec Loss 4.0790 LearningRate 0.0288 Epoch: 9 Global Step: 115150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:41:14,150-Speed 3247.37 samples/sec Loss 4.1568 LearningRate 0.0288 Epoch: 9 Global Step: 115160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:41:17,361-Speed 3190.62 samples/sec Loss 4.1714 LearningRate 0.0288 Epoch: 9 Global Step: 115170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:41:20,492-Speed 3271.25 samples/sec Loss 4.0597 LearningRate 0.0288 Epoch: 9 Global Step: 115180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:41:23,595-Speed 3301.42 samples/sec Loss 4.1880 LearningRate 0.0288 Epoch: 9 Global Step: 115190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:41:26,719-Speed 3279.04 samples/sec Loss 4.1693 LearningRate 0.0288 Epoch: 9 Global Step: 115200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:41:29,860-Speed 3260.58 samples/sec Loss 4.1646 LearningRate 0.0288 Epoch: 9 Global Step: 115210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:41:32,955-Speed 3309.05 samples/sec Loss 4.1595 LearningRate 0.0287 Epoch: 9 Global Step: 115220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:41:36,084-Speed 3275.54 samples/sec Loss 4.0828 LearningRate 0.0287 Epoch: 9 Global Step: 115230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:41:39,212-Speed 3274.49 samples/sec Loss 4.1999 LearningRate 0.0287 Epoch: 9 Global Step: 115240 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:41:42,328-Speed 3286.82 samples/sec Loss 4.1514 LearningRate 0.0287 Epoch: 9 Global Step: 115250 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:41:45,421-Speed 3312.76 samples/sec Loss 4.1930 LearningRate 0.0287 Epoch: 9 Global Step: 115260 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:41:48,505-Speed 3320.42 samples/sec Loss 4.1726 LearningRate 0.0287 Epoch: 9 Global Step: 115270 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:41:51,674-Speed 3233.12 samples/sec Loss 4.2318 LearningRate 0.0287 Epoch: 9 Global Step: 115280 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:41:54,837-Speed 3238.07 samples/sec Loss 4.2205 LearningRate 0.0287 Epoch: 9 Global Step: 115290 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:41:57,944-Speed 3297.04 samples/sec Loss 4.2011 LearningRate 0.0287 Epoch: 9 Global Step: 115300 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:01,037-Speed 3311.32 samples/sec Loss 4.1455 LearningRate 0.0287 Epoch: 9 Global Step: 115310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:04,242-Speed 3196.27 samples/sec Loss 4.1808 LearningRate 0.0287 Epoch: 9 Global Step: 115320 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:07,353-Speed 3292.68 samples/sec Loss 4.1276 LearningRate 0.0287 Epoch: 9 Global Step: 115330 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:10,414-Speed 3347.01 samples/sec Loss 4.1139 LearningRate 0.0287 Epoch: 9 Global Step: 115340 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:13,525-Speed 3291.78 samples/sec Loss 4.1720 LearningRate 0.0287 Epoch: 9 Global Step: 115350 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:16,660-Speed 3267.53 samples/sec Loss 4.2175 LearningRate 0.0287 Epoch: 9 Global Step: 115360 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:19,836-Speed 3225.04 samples/sec Loss 4.2212 LearningRate 0.0287 Epoch: 9 Global Step: 115370 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:22,928-Speed 3313.16 samples/sec Loss 4.1571 LearningRate 0.0287 Epoch: 9 Global Step: 115380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:26,055-Speed 3276.25 samples/sec Loss 4.1830 LearningRate 0.0287 Epoch: 9 Global Step: 115390 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:29,151-Speed 3308.65 samples/sec Loss 4.1588 LearningRate 0.0287 Epoch: 9 Global Step: 115400 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:32,349-Speed 3202.09 samples/sec Loss 4.1354 LearningRate 0.0287 Epoch: 9 Global Step: 115410 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:35,458-Speed 3295.62 samples/sec Loss 4.2161 LearningRate 0.0287 Epoch: 9 Global Step: 115420 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:38,534-Speed 3329.07 samples/sec Loss 4.1290 LearningRate 0.0287 Epoch: 9 Global Step: 115430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:42:41,652-Speed 3286.16 samples/sec Loss 4.1109 LearningRate 0.0287 Epoch: 9 Global Step: 115440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:42:44,786-Speed 3267.82 samples/sec Loss 4.1541 LearningRate 0.0287 Epoch: 9 Global Step: 115450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:42:47,872-Speed 3319.85 samples/sec Loss 4.2427 LearningRate 0.0286 Epoch: 9 Global Step: 115460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:42:51,011-Speed 3262.58 samples/sec Loss 4.1430 LearningRate 0.0286 Epoch: 9 Global Step: 115470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:42:54,197-Speed 3215.96 samples/sec Loss 4.1199 LearningRate 0.0286 Epoch: 9 Global Step: 115480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:42:57,293-Speed 3308.37 samples/sec Loss 4.2557 LearningRate 0.0286 Epoch: 9 Global Step: 115490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:43:00,425-Speed 3270.75 samples/sec Loss 4.2149 LearningRate 0.0286 Epoch: 9 Global Step: 115500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:43:03,523-Speed 3306.12 samples/sec Loss 4.1318 LearningRate 0.0286 Epoch: 9 Global Step: 115510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:43:06,611-Speed 3316.68 samples/sec Loss 4.1654 LearningRate 0.0286 Epoch: 9 Global Step: 115520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:43:09,674-Speed 3344.40 samples/sec Loss 4.2100 LearningRate 0.0286 Epoch: 9 Global Step: 115530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:43:12,780-Speed 3298.16 samples/sec Loss 4.2417 LearningRate 0.0286 Epoch: 9 Global Step: 115540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:43:15,889-Speed 3295.10 samples/sec Loss 4.2654 LearningRate 0.0286 Epoch: 9 Global Step: 115550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:43:19,042-Speed 3248.72 samples/sec Loss 4.1196 LearningRate 0.0286 Epoch: 9 Global Step: 115560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:43:22,108-Speed 3341.49 samples/sec Loss 4.2711 LearningRate 0.0286 Epoch: 9 Global Step: 115570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:43:25,216-Speed 3294.92 samples/sec Loss 4.1847 LearningRate 0.0286 Epoch: 9 Global Step: 115580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:43:28,297-Speed 3324.53 samples/sec Loss 4.1279 LearningRate 0.0286 Epoch: 9 Global Step: 115590 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:43:31,400-Speed 3300.75 samples/sec Loss 4.3126 LearningRate 0.0286 Epoch: 9 Global Step: 115600 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:43:34,487-Speed 3318.46 samples/sec Loss 4.2098 LearningRate 0.0286 Epoch: 9 Global Step: 115610 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:43:37,628-Speed 3261.38 samples/sec Loss 4.1292 LearningRate 0.0286 Epoch: 9 Global Step: 115620 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:43:40,755-Speed 3275.21 samples/sec Loss 4.1868 LearningRate 0.0286 Epoch: 9 Global Step: 115630 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:43:43,894-Speed 3263.83 samples/sec Loss 4.2999 LearningRate 0.0286 Epoch: 9 Global Step: 115640 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:43:47,095-Speed 3200.17 samples/sec Loss 4.1406 LearningRate 0.0286 Epoch: 9 Global Step: 115650 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:43:50,161-Speed 3340.42 samples/sec Loss 4.2489 LearningRate 0.0286 Epoch: 9 Global Step: 115660 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:43:53,247-Speed 3320.08 samples/sec Loss 4.2655 LearningRate 0.0286 Epoch: 9 Global Step: 115670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:43:56,307-Speed 3347.49 samples/sec Loss 4.2162 LearningRate 0.0286 Epoch: 9 Global Step: 115680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:43:59,392-Speed 3319.15 samples/sec Loss 4.2076 LearningRate 0.0285 Epoch: 9 Global Step: 115690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:02,552-Speed 3242.03 samples/sec Loss 4.1149 LearningRate 0.0285 Epoch: 9 Global Step: 115700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:05,634-Speed 3324.17 samples/sec Loss 4.2768 LearningRate 0.0285 Epoch: 9 Global Step: 115710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:08,686-Speed 3356.08 samples/sec Loss 4.2923 LearningRate 0.0285 Epoch: 9 Global Step: 115720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:11,809-Speed 3279.77 samples/sec Loss 4.1646 LearningRate 0.0285 Epoch: 9 Global Step: 115730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:14,957-Speed 3253.32 samples/sec Loss 4.3696 LearningRate 0.0285 Epoch: 9 Global Step: 115740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:18,120-Speed 3238.75 samples/sec Loss 4.1991 LearningRate 0.0285 Epoch: 9 Global Step: 115750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:21,197-Speed 3329.00 samples/sec Loss 4.2507 LearningRate 0.0285 Epoch: 9 Global Step: 115760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:24,335-Speed 3264.12 samples/sec Loss 4.1366 LearningRate 0.0285 Epoch: 9 Global Step: 115770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:44:27,401-Speed 3341.09 samples/sec Loss 4.2160 LearningRate 0.0285 Epoch: 9 Global Step: 115780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:44:30,479-Speed 3328.38 samples/sec Loss 4.2410 LearningRate 0.0285 Epoch: 9 Global Step: 115790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:44:33,544-Speed 3341.67 samples/sec Loss 4.0996 LearningRate 0.0285 Epoch: 9 Global Step: 115800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:44:36,613-Speed 3337.76 samples/sec Loss 4.2953 LearningRate 0.0285 Epoch: 9 Global Step: 115810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:39,684-Speed 3334.60 samples/sec Loss 4.1574 LearningRate 0.0285 Epoch: 9 Global Step: 115820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:42,759-Speed 3331.08 samples/sec Loss 4.3107 LearningRate 0.0285 Epoch: 9 Global Step: 115830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:45,827-Speed 3339.12 samples/sec Loss 4.1561 LearningRate 0.0285 Epoch: 9 Global Step: 115840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:48,885-Speed 3349.97 samples/sec Loss 4.1456 LearningRate 0.0285 Epoch: 9 Global Step: 115850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:51,951-Speed 3340.74 samples/sec Loss 4.1299 LearningRate 0.0285 Epoch: 9 Global Step: 115860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:55,006-Speed 3353.97 samples/sec Loss 4.1979 LearningRate 0.0285 Epoch: 9 Global Step: 115870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:44:58,059-Speed 3356.04 samples/sec Loss 4.3016 LearningRate 0.0285 Epoch: 9 Global Step: 115880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:45:01,179-Speed 3282.88 samples/sec Loss 4.2306 LearningRate 0.0285 Epoch: 9 Global Step: 115890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:45:04,310-Speed 3271.35 samples/sec Loss 4.1388 LearningRate 0.0285 Epoch: 9 Global Step: 115900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:45:07,426-Speed 3288.14 samples/sec Loss 4.1906 LearningRate 0.0285 Epoch: 9 Global Step: 115910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:45:10,488-Speed 3344.70 samples/sec Loss 4.2881 LearningRate 0.0284 Epoch: 9 Global Step: 115920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:45:13,560-Speed 3334.14 samples/sec Loss 4.2401 LearningRate 0.0284 Epoch: 9 Global Step: 115930 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:16,675-Speed 3289.13 samples/sec Loss 4.1870 LearningRate 0.0284 Epoch: 9 Global Step: 115940 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:19,772-Speed 3307.15 samples/sec Loss 4.2281 LearningRate 0.0284 Epoch: 9 Global Step: 115950 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:22,914-Speed 3260.22 samples/sec Loss 4.0962 LearningRate 0.0284 Epoch: 9 Global Step: 115960 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:26,082-Speed 3233.47 samples/sec Loss 4.3209 LearningRate 0.0284 Epoch: 9 Global Step: 115970 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:29,225-Speed 3259.30 samples/sec Loss 4.2664 LearningRate 0.0284 Epoch: 9 Global Step: 115980 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:32,351-Speed 3276.22 samples/sec Loss 4.2032 LearningRate 0.0284 Epoch: 9 Global Step: 115990 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:35,459-Speed 3296.42 samples/sec Loss 4.2826 LearningRate 0.0284 Epoch: 9 Global Step: 116000 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:38,535-Speed 3330.46 samples/sec Loss 4.1173 LearningRate 0.0284 Epoch: 9 Global Step: 116010 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:41,601-Speed 3340.51 samples/sec Loss 4.2659 LearningRate 0.0284 Epoch: 9 Global Step: 116020 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:44,666-Speed 3342.65 samples/sec Loss 4.2865 LearningRate 0.0284 Epoch: 9 Global Step: 116030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:45:47,737-Speed 3335.20 samples/sec Loss 4.2270 LearningRate 0.0284 Epoch: 9 Global Step: 116040 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:50,813-Speed 3329.64 samples/sec Loss 4.1638 LearningRate 0.0284 Epoch: 9 Global Step: 116050 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:53,893-Speed 3325.88 samples/sec Loss 4.2624 LearningRate 0.0284 Epoch: 9 Global Step: 116060 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:45:56,980-Speed 3318.50 samples/sec Loss 4.1987 LearningRate 0.0284 Epoch: 9 Global Step: 116070 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:46:00,089-Speed 3294.48 samples/sec Loss 4.1934 LearningRate 0.0284 Epoch: 9 Global Step: 116080 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:46:03,185-Speed 3308.28 samples/sec Loss 4.1655 LearningRate 0.0284 Epoch: 9 Global Step: 116090 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:46:06,369-Speed 3217.29 samples/sec Loss 4.2632 LearningRate 0.0284 Epoch: 9 Global Step: 116100 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:46:09,443-Speed 3332.37 samples/sec Loss 4.2998 LearningRate 0.0284 Epoch: 9 Global Step: 116110 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:46:12,548-Speed 3298.55 samples/sec Loss 4.2117 LearningRate 0.0284 Epoch: 9 Global Step: 116120 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:46:15,668-Speed 3282.92 samples/sec Loss 4.2411 LearningRate 0.0284 Epoch: 9 Global Step: 116130 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:46:18,742-Speed 3332.81 samples/sec Loss 4.1826 LearningRate 0.0284 Epoch: 9 Global Step: 116140 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:46:21,828-Speed 3319.29 samples/sec Loss 4.2463 LearningRate 0.0283 Epoch: 9 Global Step: 116150 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:46:24,981-Speed 3248.06 samples/sec Loss 4.2317 LearningRate 0.0283 Epoch: 9 Global Step: 116160 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:46:28,117-Speed 3266.44 samples/sec Loss 4.2784 LearningRate 0.0283 Epoch: 9 Global Step: 116170 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:46:31,237-Speed 3284.09 samples/sec Loss 4.1198 LearningRate 0.0283 Epoch: 9 Global Step: 116180 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:46:34,289-Speed 3355.46 samples/sec Loss 4.2292 LearningRate 0.0283 Epoch: 9 Global Step: 116190 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:46:37,421-Speed 3271.32 samples/sec Loss 4.1294 LearningRate 0.0283 Epoch: 9 Global Step: 116200 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:46:40,488-Speed 3339.97 samples/sec Loss 4.2793 LearningRate 0.0283 Epoch: 9 Global Step: 116210 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:46:43,640-Speed 3249.33 samples/sec Loss 4.1675 LearningRate 0.0283 Epoch: 9 Global Step: 116220 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:46:46,744-Speed 3300.59 samples/sec Loss 4.2922 LearningRate 0.0283 Epoch: 9 Global Step: 116230 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:46:49,824-Speed 3326.01 samples/sec Loss 4.2368 LearningRate 0.0283 Epoch: 9 Global Step: 116240 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:46:52,962-Speed 3263.26 samples/sec Loss 4.2820 LearningRate 0.0283 Epoch: 9 Global Step: 116250 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:46:56,020-Speed 3350.17 samples/sec Loss 4.2447 LearningRate 0.0283 Epoch: 9 Global Step: 116260 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:46:59,072-Speed 3356.74 samples/sec Loss 4.2037 LearningRate 0.0283 Epoch: 9 Global Step: 116270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:47:02,219-Speed 3254.86 samples/sec Loss 4.1553 LearningRate 0.0283 Epoch: 9 Global Step: 116280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:47:05,321-Speed 3302.57 samples/sec Loss 4.2645 LearningRate 0.0283 Epoch: 9 Global Step: 116290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:47:08,414-Speed 3311.33 samples/sec Loss 4.3301 LearningRate 0.0283 Epoch: 9 Global Step: 116300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:47:11,530-Speed 3287.08 samples/sec Loss 4.2018 LearningRate 0.0283 Epoch: 9 Global Step: 116310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:47:14,691-Speed 3241.03 samples/sec Loss 4.2405 LearningRate 0.0283 Epoch: 9 Global Step: 116320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:47:17,817-Speed 3276.88 samples/sec Loss 4.2575 LearningRate 0.0283 Epoch: 9 Global Step: 116330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:47:20,952-Speed 3266.57 samples/sec Loss 4.2479 LearningRate 0.0283 Epoch: 9 Global Step: 116340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:47:24,124-Speed 3229.19 samples/sec Loss 4.3123 LearningRate 0.0283 Epoch: 9 Global Step: 116350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:47:27,245-Speed 3281.88 samples/sec Loss 4.2012 LearningRate 0.0283 Epoch: 9 Global Step: 116360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:47:30,407-Speed 3239.77 samples/sec Loss 4.2288 LearningRate 0.0283 Epoch: 9 Global Step: 116370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:47:33,448-Speed 3368.73 samples/sec Loss 4.2410 LearningRate 0.0283 Epoch: 9 Global Step: 116380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:47:36,616-Speed 3233.61 samples/sec Loss 4.2540 LearningRate 0.0282 Epoch: 9 Global Step: 116390 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:47:39,705-Speed 3315.91 samples/sec Loss 4.2567 LearningRate 0.0282 Epoch: 9 Global Step: 116400 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:47:42,792-Speed 3317.48 samples/sec Loss 4.2492 LearningRate 0.0282 Epoch: 9 Global Step: 116410 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:47:45,867-Speed 3332.13 samples/sec Loss 4.2648 LearningRate 0.0282 Epoch: 9 Global Step: 116420 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:47:48,998-Speed 3271.35 samples/sec Loss 4.3413 LearningRate 0.0282 Epoch: 9 Global Step: 116430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:47:52,144-Speed 3254.87 samples/sec Loss 4.2665 LearningRate 0.0282 Epoch: 9 Global Step: 116440 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:47:55,257-Speed 3291.09 samples/sec Loss 4.2032 LearningRate 0.0282 Epoch: 9 Global Step: 116450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:47:58,364-Speed 3296.77 samples/sec Loss 4.3204 LearningRate 0.0282 Epoch: 9 Global Step: 116460 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:48:01,469-Speed 3298.24 samples/sec Loss 4.2670 LearningRate 0.0282 Epoch: 9 Global Step: 116470 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:48:04,601-Speed 3271.23 samples/sec Loss 4.3673 LearningRate 0.0282 Epoch: 9 Global Step: 116480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:07,740-Speed 3263.22 samples/sec Loss 4.2893 LearningRate 0.0282 Epoch: 9 Global Step: 116490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:10,848-Speed 3295.40 samples/sec Loss 4.2225 LearningRate 0.0282 Epoch: 9 Global Step: 116500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:13,980-Speed 3271.14 samples/sec Loss 4.2421 LearningRate 0.0282 Epoch: 9 Global Step: 116510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:17,121-Speed 3260.89 samples/sec Loss 4.2939 LearningRate 0.0282 Epoch: 9 Global Step: 116520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:20,196-Speed 3330.87 samples/sec Loss 4.2006 LearningRate 0.0282 Epoch: 9 Global Step: 116530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:23,307-Speed 3292.40 samples/sec Loss 4.1298 LearningRate 0.0282 Epoch: 9 Global Step: 116540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:26,386-Speed 3327.26 samples/sec Loss 4.2899 LearningRate 0.0282 Epoch: 9 Global Step: 116550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:29,565-Speed 3222.48 samples/sec Loss 4.2136 LearningRate 0.0282 Epoch: 9 Global Step: 116560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:32,702-Speed 3264.73 samples/sec Loss 4.2420 LearningRate 0.0282 Epoch: 9 Global Step: 116570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:35,773-Speed 3335.70 samples/sec Loss 4.2720 LearningRate 0.0282 Epoch: 9 Global Step: 116580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:48:38,923-Speed 3252.41 samples/sec Loss 4.2875 LearningRate 0.0282 Epoch: 9 Global Step: 116590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:48:42,070-Speed 3253.88 samples/sec Loss 4.2639 LearningRate 0.0282 Epoch: 9 Global Step: 116600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:48:45,179-Speed 3295.80 samples/sec Loss 4.2418 LearningRate 0.0282 Epoch: 9 Global Step: 116610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:48:48,272-Speed 3311.07 samples/sec Loss 4.2372 LearningRate 0.0281 Epoch: 9 Global Step: 116620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:48:51,459-Speed 3214.00 samples/sec Loss 4.3016 LearningRate 0.0281 Epoch: 9 Global Step: 116630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:54,546-Speed 3318.52 samples/sec Loss 4.1907 LearningRate 0.0281 Epoch: 9 Global Step: 116640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:48:57,606-Speed 3347.86 samples/sec Loss 4.3008 LearningRate 0.0281 Epoch: 9 Global Step: 116650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:49:00,676-Speed 3335.96 samples/sec Loss 4.2495 LearningRate 0.0281 Epoch: 9 Global Step: 116660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:49:03,750-Speed 3331.84 samples/sec Loss 4.3331 LearningRate 0.0281 Epoch: 9 Global Step: 116670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:49:06,947-Speed 3204.74 samples/sec Loss 4.3364 LearningRate 0.0281 Epoch: 9 Global Step: 116680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:49:09,997-Speed 3358.37 samples/sec Loss 4.2788 LearningRate 0.0281 Epoch: 9 Global Step: 116690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:49:13,085-Speed 3316.72 samples/sec Loss 4.2549 LearningRate 0.0281 Epoch: 9 Global Step: 116700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:49:16,245-Speed 3241.80 samples/sec Loss 4.3056 LearningRate 0.0281 Epoch: 9 Global Step: 116710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:49:19,321-Speed 3329.28 samples/sec Loss 4.3280 LearningRate 0.0281 Epoch: 9 Global Step: 116720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:49:22,375-Speed 3354.69 samples/sec Loss 4.2868 LearningRate 0.0281 Epoch: 9 Global Step: 116730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:49:25,473-Speed 3306.23 samples/sec Loss 4.3367 LearningRate 0.0281 Epoch: 9 Global Step: 116740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:49:28,526-Speed 3355.02 samples/sec Loss 4.2633 LearningRate 0.0281 Epoch: 9 Global Step: 116750 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:49:31,622-Speed 3308.78 samples/sec Loss 4.3178 LearningRate 0.0281 Epoch: 9 Global Step: 116760 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:49:34,685-Speed 3343.68 samples/sec Loss 4.3607 LearningRate 0.0281 Epoch: 9 Global Step: 116770 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:49:37,777-Speed 3313.75 samples/sec Loss 4.2223 LearningRate 0.0281 Epoch: 9 Global Step: 116780 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:49:40,855-Speed 3327.55 samples/sec Loss 4.1807 LearningRate 0.0281 Epoch: 9 Global Step: 116790 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:49:43,955-Speed 3304.72 samples/sec Loss 4.2937 LearningRate 0.0281 Epoch: 9 Global Step: 116800 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:49:47,059-Speed 3299.87 samples/sec Loss 4.2549 LearningRate 0.0281 Epoch: 9 Global Step: 116810 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:49:50,206-Speed 3255.09 samples/sec Loss 4.2672 LearningRate 0.0281 Epoch: 9 Global Step: 116820 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:49:53,276-Speed 3336.50 samples/sec Loss 4.2991 LearningRate 0.0281 Epoch: 9 Global Step: 116830 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:49:56,428-Speed 3250.16 samples/sec Loss 4.3060 LearningRate 0.0281 Epoch: 9 Global Step: 116840 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:49:59,502-Speed 3331.52 samples/sec Loss 4.2470 LearningRate 0.0281 Epoch: 9 Global Step: 116850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:50:02,582-Speed 3326.17 samples/sec Loss 4.3767 LearningRate 0.0280 Epoch: 9 Global Step: 116860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:50:05,663-Speed 3324.38 samples/sec Loss 4.4089 LearningRate 0.0280 Epoch: 9 Global Step: 116870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:50:08,753-Speed 3314.92 samples/sec Loss 4.2883 LearningRate 0.0280 Epoch: 9 Global Step: 116880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:50:11,812-Speed 3348.89 samples/sec Loss 4.1990 LearningRate 0.0280 Epoch: 9 Global Step: 116890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:50:14,886-Speed 3331.32 samples/sec Loss 4.2976 LearningRate 0.0280 Epoch: 9 Global Step: 116900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:50:18,018-Speed 3270.82 samples/sec Loss 4.2308 LearningRate 0.0280 Epoch: 9 Global Step: 116910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:50:21,072-Speed 3354.48 samples/sec Loss 4.3292 LearningRate 0.0280 Epoch: 9 Global Step: 116920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:50:24,158-Speed 3318.34 samples/sec Loss 4.1927 LearningRate 0.0280 Epoch: 9 Global Step: 116930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:50:27,285-Speed 3276.63 samples/sec Loss 4.4082 LearningRate 0.0280 Epoch: 9 Global Step: 116940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:50:30,427-Speed 3259.90 samples/sec Loss 4.2369 LearningRate 0.0280 Epoch: 9 Global Step: 116950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:50:33,487-Speed 3347.16 samples/sec Loss 4.2312 LearningRate 0.0280 Epoch: 9 Global Step: 116960 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:50:36,618-Speed 3271.61 samples/sec Loss 4.2709 LearningRate 0.0280 Epoch: 9 Global Step: 116970 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:50:39,733-Speed 3288.54 samples/sec Loss 4.2392 LearningRate 0.0280 Epoch: 9 Global Step: 116980 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:50:42,877-Speed 3257.75 samples/sec Loss 4.2175 LearningRate 0.0280 Epoch: 9 Global Step: 116990 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:50:45,934-Speed 3350.85 samples/sec Loss 4.2594 LearningRate 0.0280 Epoch: 9 Global Step: 117000 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:50:49,065-Speed 3272.11 samples/sec Loss 4.2223 LearningRate 0.0280 Epoch: 9 Global Step: 117010 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:50:52,147-Speed 3323.37 samples/sec Loss 4.2285 LearningRate 0.0280 Epoch: 9 Global Step: 117020 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:50:55,261-Speed 3288.97 samples/sec Loss 4.3336 LearningRate 0.0280 Epoch: 9 Global Step: 117030 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:50:58,324-Speed 3344.41 samples/sec Loss 4.2914 LearningRate 0.0280 Epoch: 9 Global Step: 117040 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:01,405-Speed 3324.29 samples/sec Loss 4.2789 LearningRate 0.0280 Epoch: 9 Global Step: 117050 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:04,493-Speed 3317.14 samples/sec Loss 4.2362 LearningRate 0.0280 Epoch: 9 Global Step: 117060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:51:07,564-Speed 3335.85 samples/sec Loss 4.3160 LearningRate 0.0280 Epoch: 9 Global Step: 117070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:51:10,641-Speed 3328.46 samples/sec Loss 4.2398 LearningRate 0.0280 Epoch: 9 Global Step: 117080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:51:13,738-Speed 3308.27 samples/sec Loss 4.3099 LearningRate 0.0279 Epoch: 9 Global Step: 117090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:51:16,830-Speed 3312.45 samples/sec Loss 4.2666 LearningRate 0.0279 Epoch: 9 Global Step: 117100 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:19,908-Speed 3327.56 samples/sec Loss 4.2456 LearningRate 0.0279 Epoch: 9 Global Step: 117110 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:22,981-Speed 3333.52 samples/sec Loss 4.3041 LearningRate 0.0279 Epoch: 9 Global Step: 117120 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:26,047-Speed 3341.27 samples/sec Loss 4.3030 LearningRate 0.0279 Epoch: 9 Global Step: 117130 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:29,138-Speed 3313.52 samples/sec Loss 4.4190 LearningRate 0.0279 Epoch: 9 Global Step: 117140 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:32,187-Speed 3359.90 samples/sec Loss 4.2440 LearningRate 0.0279 Epoch: 9 Global Step: 117150 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:35,288-Speed 3303.37 samples/sec Loss 4.3663 LearningRate 0.0279 Epoch: 9 Global Step: 117160 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:38,346-Speed 3349.04 samples/sec Loss 4.3427 LearningRate 0.0279 Epoch: 9 Global Step: 117170 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:41,399-Speed 3356.27 samples/sec Loss 4.3071 LearningRate 0.0279 Epoch: 9 Global Step: 117180 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:44,468-Speed 3336.66 samples/sec Loss 4.3645 LearningRate 0.0279 Epoch: 9 Global Step: 117190 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:47,516-Speed 3361.87 samples/sec Loss 4.2655 LearningRate 0.0279 Epoch: 9 Global Step: 117200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:51:50,602-Speed 3319.11 samples/sec Loss 4.3074 LearningRate 0.0279 Epoch: 9 Global Step: 117210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:51:53,676-Speed 3332.18 samples/sec Loss 4.2209 LearningRate 0.0279 Epoch: 9 Global Step: 117220 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:56,727-Speed 3357.47 samples/sec Loss 4.4450 LearningRate 0.0279 Epoch: 9 Global Step: 117230 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:51:59,784-Speed 3350.90 samples/sec Loss 4.2986 LearningRate 0.0279 Epoch: 9 Global Step: 117240 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:52:02,905-Speed 3281.27 samples/sec Loss 4.2475 LearningRate 0.0279 Epoch: 9 Global Step: 117250 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:52:06,027-Speed 3281.01 samples/sec Loss 4.3254 LearningRate 0.0279 Epoch: 9 Global Step: 117260 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:52:09,101-Speed 3332.80 samples/sec Loss 4.2495 LearningRate 0.0279 Epoch: 9 Global Step: 117270 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:52:12,240-Speed 3263.43 samples/sec Loss 4.2173 LearningRate 0.0279 Epoch: 9 Global Step: 117280 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:52:15,389-Speed 3252.33 samples/sec Loss 4.3321 LearningRate 0.0279 Epoch: 9 Global Step: 117290 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:52:18,484-Speed 3309.41 samples/sec Loss 4.3333 LearningRate 0.0279 Epoch: 9 Global Step: 117300 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:52:21,593-Speed 3294.52 samples/sec Loss 4.2768 LearningRate 0.0279 Epoch: 9 Global Step: 117310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:52:24,684-Speed 3313.98 samples/sec Loss 4.2403 LearningRate 0.0279 Epoch: 9 Global Step: 117320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:52:27,817-Speed 3269.41 samples/sec Loss 4.3540 LearningRate 0.0278 Epoch: 9 Global Step: 117330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:52:30,929-Speed 3292.23 samples/sec Loss 4.3956 LearningRate 0.0278 Epoch: 9 Global Step: 117340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:52:34,015-Speed 3318.80 samples/sec Loss 4.3419 LearningRate 0.0278 Epoch: 9 Global Step: 117350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:52:37,149-Speed 3268.76 samples/sec Loss 4.3619 LearningRate 0.0278 Epoch: 9 Global Step: 117360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:52:40,283-Speed 3268.54 samples/sec Loss 4.4001 LearningRate 0.0278 Epoch: 9 Global Step: 117370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:52:43,375-Speed 3312.32 samples/sec Loss 4.2561 LearningRate 0.0278 Epoch: 9 Global Step: 117380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:52:46,452-Speed 3328.67 samples/sec Loss 4.2952 LearningRate 0.0278 Epoch: 9 Global Step: 117390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:52:49,578-Speed 3277.04 samples/sec Loss 4.2021 LearningRate 0.0278 Epoch: 9 Global Step: 117400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:52:52,708-Speed 3272.22 samples/sec Loss 4.2819 LearningRate 0.0278 Epoch: 9 Global Step: 117410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:52:55,795-Speed 3318.97 samples/sec Loss 4.2612 LearningRate 0.0278 Epoch: 9 Global Step: 117420 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:52:58,915-Speed 3282.70 samples/sec Loss 4.2546 LearningRate 0.0278 Epoch: 9 Global Step: 117430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:02,002-Speed 3317.90 samples/sec Loss 4.3132 LearningRate 0.0278 Epoch: 9 Global Step: 117440 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:05,183-Speed 3220.06 samples/sec Loss 4.3151 LearningRate 0.0278 Epoch: 9 Global Step: 117450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:08,303-Speed 3283.17 samples/sec Loss 4.3785 LearningRate 0.0278 Epoch: 9 Global Step: 117460 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:11,501-Speed 3203.69 samples/sec Loss 4.3353 LearningRate 0.0278 Epoch: 9 Global Step: 117470 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:14,589-Speed 3316.72 samples/sec Loss 4.3088 LearningRate 0.0278 Epoch: 9 Global Step: 117480 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:17,753-Speed 3237.67 samples/sec Loss 4.3387 LearningRate 0.0278 Epoch: 9 Global Step: 117490 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:20,806-Speed 3355.36 samples/sec Loss 4.3812 LearningRate 0.0278 Epoch: 9 Global Step: 117500 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:23,928-Speed 3281.08 samples/sec Loss 4.3353 LearningRate 0.0278 Epoch: 9 Global Step: 117510 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:27,085-Speed 3244.12 samples/sec Loss 4.3780 LearningRate 0.0278 Epoch: 9 Global Step: 117520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:30,217-Speed 3270.39 samples/sec Loss 4.3347 LearningRate 0.0278 Epoch: 9 Global Step: 117530 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:33,300-Speed 3322.83 samples/sec Loss 4.3257 LearningRate 0.0278 Epoch: 9 Global Step: 117540 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:36,420-Speed 3283.42 samples/sec Loss 4.2913 LearningRate 0.0278 Epoch: 9 Global Step: 117550 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:39,534-Speed 3288.88 samples/sec Loss 4.3136 LearningRate 0.0277 Epoch: 9 Global Step: 117560 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:42,678-Speed 3258.52 samples/sec Loss 4.2128 LearningRate 0.0277 Epoch: 9 Global Step: 117570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:45,737-Speed 3348.86 samples/sec Loss 4.3514 LearningRate 0.0277 Epoch: 9 Global Step: 117580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:48,888-Speed 3250.03 samples/sec Loss 4.3477 LearningRate 0.0277 Epoch: 9 Global Step: 117590 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:52,008-Speed 3283.47 samples/sec Loss 4.2563 LearningRate 0.0277 Epoch: 9 Global Step: 117600 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:55,093-Speed 3320.13 samples/sec Loss 4.3089 LearningRate 0.0277 Epoch: 9 Global Step: 117610 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:53:58,149-Speed 3351.44 samples/sec Loss 4.4025 LearningRate 0.0277 Epoch: 9 Global Step: 117620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:02,035-Speed 2635.61 samples/sec Loss 4.2950 LearningRate 0.0277 Epoch: 9 Global Step: 117630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:05,175-Speed 3262.18 samples/sec Loss 4.3142 LearningRate 0.0277 Epoch: 9 Global Step: 117640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:08,260-Speed 3320.99 samples/sec Loss 4.3266 LearningRate 0.0277 Epoch: 9 Global Step: 117650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:11,415-Speed 3246.29 samples/sec Loss 4.2295 LearningRate 0.0277 Epoch: 9 Global Step: 117660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:14,484-Speed 3337.84 samples/sec Loss 4.3190 LearningRate 0.0277 Epoch: 9 Global Step: 117670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:17,571-Speed 3317.44 samples/sec Loss 4.3164 LearningRate 0.0277 Epoch: 9 Global Step: 117680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:20,627-Speed 3352.56 samples/sec Loss 4.3465 LearningRate 0.0277 Epoch: 9 Global Step: 117690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:23,736-Speed 3294.94 samples/sec Loss 4.4158 LearningRate 0.0277 Epoch: 9 Global Step: 117700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:26,905-Speed 3231.87 samples/sec Loss 4.3857 LearningRate 0.0277 Epoch: 9 Global Step: 117710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:30,087-Speed 3218.51 samples/sec Loss 4.3428 LearningRate 0.0277 Epoch: 9 Global Step: 117720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:54:33,173-Speed 3319.19 samples/sec Loss 4.4410 LearningRate 0.0277 Epoch: 9 Global Step: 117730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:54:36,345-Speed 3229.27 samples/sec Loss 4.2803 LearningRate 0.0277 Epoch: 9 Global Step: 117740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:54:39,510-Speed 3236.87 samples/sec Loss 4.4129 LearningRate 0.0277 Epoch: 9 Global Step: 117750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:54:42,606-Speed 3308.22 samples/sec Loss 4.2257 LearningRate 0.0277 Epoch: 9 Global Step: 117760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:45,715-Speed 3294.60 samples/sec Loss 4.3836 LearningRate 0.0277 Epoch: 9 Global Step: 117770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:48,825-Speed 3293.68 samples/sec Loss 4.3524 LearningRate 0.0277 Epoch: 9 Global Step: 117780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:51,962-Speed 3266.00 samples/sec Loss 4.2175 LearningRate 0.0277 Epoch: 9 Global Step: 117790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:55,054-Speed 3312.35 samples/sec Loss 4.2061 LearningRate 0.0276 Epoch: 9 Global Step: 117800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:54:58,133-Speed 3327.24 samples/sec Loss 4.2620 LearningRate 0.0276 Epoch: 9 Global Step: 117810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:01,249-Speed 3287.32 samples/sec Loss 4.3177 LearningRate 0.0276 Epoch: 9 Global Step: 117820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:04,379-Speed 3272.52 samples/sec Loss 4.3946 LearningRate 0.0276 Epoch: 9 Global Step: 117830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:07,492-Speed 3290.69 samples/sec Loss 4.3542 LearningRate 0.0276 Epoch: 9 Global Step: 117840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:10,590-Speed 3305.77 samples/sec Loss 4.3142 LearningRate 0.0276 Epoch: 9 Global Step: 117850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:13,794-Speed 3197.87 samples/sec Loss 4.4230 LearningRate 0.0276 Epoch: 9 Global Step: 117860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:55:16,983-Speed 3211.79 samples/sec Loss 4.3687 LearningRate 0.0276 Epoch: 9 Global Step: 117870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:55:20,117-Speed 3267.99 samples/sec Loss 4.3528 LearningRate 0.0276 Epoch: 9 Global Step: 117880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:55:23,182-Speed 3342.08 samples/sec Loss 4.3578 LearningRate 0.0276 Epoch: 9 Global Step: 117890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:55:26,272-Speed 3315.18 samples/sec Loss 4.3952 LearningRate 0.0276 Epoch: 9 Global Step: 117900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:29,382-Speed 3293.43 samples/sec Loss 4.3785 LearningRate 0.0276 Epoch: 9 Global Step: 117910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:32,487-Speed 3298.98 samples/sec Loss 4.3118 LearningRate 0.0276 Epoch: 9 Global Step: 117920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:35,608-Speed 3282.28 samples/sec Loss 4.2752 LearningRate 0.0276 Epoch: 9 Global Step: 117930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:38,722-Speed 3288.92 samples/sec Loss 4.4087 LearningRate 0.0276 Epoch: 9 Global Step: 117940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:41,803-Speed 3325.06 samples/sec Loss 4.1865 LearningRate 0.0276 Epoch: 9 Global Step: 117950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:44,874-Speed 3335.76 samples/sec Loss 4.3694 LearningRate 0.0276 Epoch: 9 Global Step: 117960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:47,977-Speed 3300.82 samples/sec Loss 4.3500 LearningRate 0.0276 Epoch: 9 Global Step: 117970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:51,070-Speed 3310.65 samples/sec Loss 4.4120 LearningRate 0.0276 Epoch: 9 Global Step: 117980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:54,166-Speed 3309.05 samples/sec Loss 4.3680 LearningRate 0.0276 Epoch: 9 Global Step: 117990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:55:57,286-Speed 3283.31 samples/sec Loss 4.3178 LearningRate 0.0276 Epoch: 9 Global Step: 118000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:56:00,357-Speed 3335.45 samples/sec Loss 4.3886 LearningRate 0.0276 Epoch: 9 Global Step: 118010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:56:03,422-Speed 3341.78 samples/sec Loss 4.4462 LearningRate 0.0276 Epoch: 9 Global Step: 118020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:56:06,496-Speed 3332.00 samples/sec Loss 4.2906 LearningRate 0.0275 Epoch: 9 Global Step: 118030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:56:09,587-Speed 3313.67 samples/sec Loss 4.3551 LearningRate 0.0275 Epoch: 9 Global Step: 118040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:56:12,655-Speed 3339.22 samples/sec Loss 4.3236 LearningRate 0.0275 Epoch: 9 Global Step: 118050 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:15,784-Speed 3274.05 samples/sec Loss 4.3813 LearningRate 0.0275 Epoch: 9 Global Step: 118060 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:18,861-Speed 3328.51 samples/sec Loss 4.3402 LearningRate 0.0275 Epoch: 9 Global Step: 118070 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:21,934-Speed 3333.34 samples/sec Loss 4.3607 LearningRate 0.0275 Epoch: 9 Global Step: 118080 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:25,134-Speed 3201.38 samples/sec Loss 4.3073 LearningRate 0.0275 Epoch: 9 Global Step: 118090 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:28,269-Speed 3267.29 samples/sec Loss 4.3897 LearningRate 0.0275 Epoch: 9 Global Step: 118100 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:31,360-Speed 3313.84 samples/sec Loss 4.3324 LearningRate 0.0275 Epoch: 9 Global Step: 118110 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:34,463-Speed 3300.83 samples/sec Loss 4.4760 LearningRate 0.0275 Epoch: 9 Global Step: 118120 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:37,628-Speed 3236.88 samples/sec Loss 4.3636 LearningRate 0.0275 Epoch: 9 Global Step: 118130 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:40,816-Speed 3213.04 samples/sec Loss 4.3827 LearningRate 0.0275 Epoch: 9 Global Step: 118140 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:43,950-Speed 3268.18 samples/sec Loss 4.3081 LearningRate 0.0275 Epoch: 9 Global Step: 118150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:56:47,051-Speed 3303.28 samples/sec Loss 4.3392 LearningRate 0.0275 Epoch: 9 Global Step: 118160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:56:50,132-Speed 3324.42 samples/sec Loss 4.3055 LearningRate 0.0275 Epoch: 9 Global Step: 118170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:56:53,262-Speed 3272.49 samples/sec Loss 4.4035 LearningRate 0.0275 Epoch: 9 Global Step: 118180 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:56,397-Speed 3267.46 samples/sec Loss 4.3023 LearningRate 0.0275 Epoch: 9 Global Step: 118190 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:56:59,480-Speed 3322.64 samples/sec Loss 4.3155 LearningRate 0.0275 Epoch: 9 Global Step: 118200 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:57:02,614-Speed 3268.03 samples/sec Loss 4.3714 LearningRate 0.0275 Epoch: 9 Global Step: 118210 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:57:05,742-Speed 3274.88 samples/sec Loss 4.2090 LearningRate 0.0275 Epoch: 9 Global Step: 118220 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:57:08,846-Speed 3299.75 samples/sec Loss 4.3182 LearningRate 0.0275 Epoch: 9 Global Step: 118230 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:57:12,002-Speed 3245.98 samples/sec Loss 4.2732 LearningRate 0.0275 Epoch: 9 Global Step: 118240 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:57:15,197-Speed 3206.39 samples/sec Loss 4.3936 LearningRate 0.0275 Epoch: 9 Global Step: 118250 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:57:18,320-Speed 3279.21 samples/sec Loss 4.3658 LearningRate 0.0275 Epoch: 9 Global Step: 118260 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:57:21,412-Speed 3313.05 samples/sec Loss 4.3436 LearningRate 0.0274 Epoch: 9 Global Step: 118270 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:57:24,546-Speed 3268.37 samples/sec Loss 4.3035 LearningRate 0.0274 Epoch: 9 Global Step: 118280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:57:27,667-Speed 3282.19 samples/sec Loss 4.2775 LearningRate 0.0274 Epoch: 9 Global Step: 118290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:57:30,763-Speed 3308.89 samples/sec Loss 4.3150 LearningRate 0.0274 Epoch: 9 Global Step: 118300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:57:33,859-Speed 3308.24 samples/sec Loss 4.3764 LearningRate 0.0274 Epoch: 9 Global Step: 118310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:57:37,015-Speed 3246.32 samples/sec Loss 4.4921 LearningRate 0.0274 Epoch: 9 Global Step: 118320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:57:40,128-Speed 3289.92 samples/sec Loss 4.3251 LearningRate 0.0274 Epoch: 9 Global Step: 118330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:57:43,222-Speed 3310.61 samples/sec Loss 4.4612 LearningRate 0.0274 Epoch: 9 Global Step: 118340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:57:46,309-Speed 3318.97 samples/sec Loss 4.4010 LearningRate 0.0274 Epoch: 9 Global Step: 118350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:57:49,361-Speed 3356.14 samples/sec Loss 4.3398 LearningRate 0.0274 Epoch: 9 Global Step: 118360 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:57:52,465-Speed 3299.70 samples/sec Loss 4.3229 LearningRate 0.0274 Epoch: 9 Global Step: 118370 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:57:55,575-Speed 3293.57 samples/sec Loss 4.3785 LearningRate 0.0274 Epoch: 9 Global Step: 118380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:57:58,647-Speed 3334.93 samples/sec Loss 4.3455 LearningRate 0.0274 Epoch: 9 Global Step: 118390 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:58:01,756-Speed 3295.15 samples/sec Loss 4.3831 LearningRate 0.0274 Epoch: 9 Global Step: 118400 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:58:04,823-Speed 3339.53 samples/sec Loss 4.3389 LearningRate 0.0274 Epoch: 9 Global Step: 118410 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:58:07,909-Speed 3318.86 samples/sec Loss 4.3477 LearningRate 0.0274 Epoch: 9 Global Step: 118420 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:58:11,017-Speed 3295.49 samples/sec Loss 4.3831 LearningRate 0.0274 Epoch: 9 Global Step: 118430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:58:14,160-Speed 3259.41 samples/sec Loss 4.3572 LearningRate 0.0274 Epoch: 9 Global Step: 118440 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:58:17,280-Speed 3283.51 samples/sec Loss 4.3067 LearningRate 0.0274 Epoch: 9 Global Step: 118450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 11:58:20,399-Speed 3283.50 samples/sec Loss 4.3317 LearningRate 0.0274 Epoch: 9 Global Step: 118460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:58:23,470-Speed 3335.84 samples/sec Loss 4.2873 LearningRate 0.0274 Epoch: 9 Global Step: 118470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:58:26,540-Speed 3336.87 samples/sec Loss 4.3639 LearningRate 0.0274 Epoch: 9 Global Step: 118480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:58:29,608-Speed 3338.46 samples/sec Loss 4.2961 LearningRate 0.0274 Epoch: 9 Global Step: 118490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:58:32,685-Speed 3329.32 samples/sec Loss 4.3146 LearningRate 0.0274 Epoch: 9 Global Step: 118500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:58:35,756-Speed 3334.46 samples/sec Loss 4.3439 LearningRate 0.0273 Epoch: 9 Global Step: 118510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:58:38,852-Speed 3309.22 samples/sec Loss 4.2938 LearningRate 0.0273 Epoch: 9 Global Step: 118520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:58:41,931-Speed 3326.45 samples/sec Loss 4.4545 LearningRate 0.0273 Epoch: 9 Global Step: 118530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:58:44,997-Speed 3341.25 samples/sec Loss 4.4694 LearningRate 0.0273 Epoch: 9 Global Step: 118540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:58:48,084-Speed 3317.57 samples/sec Loss 4.4085 LearningRate 0.0273 Epoch: 9 Global Step: 118550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:58:51,173-Speed 3315.63 samples/sec Loss 4.3592 LearningRate 0.0273 Epoch: 9 Global Step: 118560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:58:54,286-Speed 3291.13 samples/sec Loss 4.3692 LearningRate 0.0273 Epoch: 9 Global Step: 118570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:58:57,401-Speed 3287.99 samples/sec Loss 4.3708 LearningRate 0.0273 Epoch: 9 Global Step: 118580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:00,506-Speed 3298.35 samples/sec Loss 4.3638 LearningRate 0.0273 Epoch: 9 Global Step: 118590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:03,632-Speed 3277.43 samples/sec Loss 4.3726 LearningRate 0.0273 Epoch: 9 Global Step: 118600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:06,708-Speed 3329.32 samples/sec Loss 4.3296 LearningRate 0.0273 Epoch: 9 Global Step: 118610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:09,788-Speed 3326.25 samples/sec Loss 4.2780 LearningRate 0.0273 Epoch: 9 Global Step: 118620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:12,903-Speed 3288.23 samples/sec Loss 4.3551 LearningRate 0.0273 Epoch: 9 Global Step: 118630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:15,986-Speed 3322.56 samples/sec Loss 4.4346 LearningRate 0.0273 Epoch: 9 Global Step: 118640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:19,074-Speed 3317.06 samples/sec Loss 4.3797 LearningRate 0.0273 Epoch: 9 Global Step: 118650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:22,145-Speed 3334.77 samples/sec Loss 4.3815 LearningRate 0.0273 Epoch: 9 Global Step: 118660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:25,260-Speed 3288.54 samples/sec Loss 4.3708 LearningRate 0.0273 Epoch: 9 Global Step: 118670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 11:59:28,413-Speed 3249.20 samples/sec Loss 4.3768 LearningRate 0.0273 Epoch: 9 Global Step: 118680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:31,531-Speed 3285.28 samples/sec Loss 4.3360 LearningRate 0.0273 Epoch: 9 Global Step: 118690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:34,595-Speed 3343.18 samples/sec Loss 4.3752 LearningRate 0.0273 Epoch: 9 Global Step: 118700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:37,692-Speed 3307.72 samples/sec Loss 4.3589 LearningRate 0.0273 Epoch: 9 Global Step: 118710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:40,850-Speed 3243.07 samples/sec Loss 4.4774 LearningRate 0.0273 Epoch: 9 Global Step: 118720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:43,989-Speed 3263.17 samples/sec Loss 4.3686 LearningRate 0.0273 Epoch: 9 Global Step: 118730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:47,062-Speed 3333.72 samples/sec Loss 4.3883 LearningRate 0.0273 Epoch: 9 Global Step: 118740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 11:59:50,093-Speed 3378.83 samples/sec Loss 4.3977 LearningRate 0.0272 Epoch: 9 Global Step: 118750 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:59:53,263-Speed 3231.88 samples/sec Loss 4.4338 LearningRate 0.0272 Epoch: 9 Global Step: 118760 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:59:56,397-Speed 3268.45 samples/sec Loss 4.3670 LearningRate 0.0272 Epoch: 9 Global Step: 118770 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 11:59:59,474-Speed 3328.34 samples/sec Loss 4.3655 LearningRate 0.0272 Epoch: 9 Global Step: 118780 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:00:02,629-Speed 3246.32 samples/sec Loss 4.4592 LearningRate 0.0272 Epoch: 9 Global Step: 118790 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:00:05,720-Speed 3314.44 samples/sec Loss 4.3520 LearningRate 0.0272 Epoch: 9 Global Step: 118800 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:00:08,798-Speed 3327.54 samples/sec Loss 4.4119 LearningRate 0.0272 Epoch: 9 Global Step: 118810 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:00:11,863-Speed 3342.51 samples/sec Loss 4.3879 LearningRate 0.0272 Epoch: 9 Global Step: 118820 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:00:14,985-Speed 3280.84 samples/sec Loss 4.3276 LearningRate 0.0272 Epoch: 9 Global Step: 118830 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:00:18,132-Speed 3254.37 samples/sec Loss 4.4289 LearningRate 0.0272 Epoch: 9 Global Step: 118840 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:00:21,212-Speed 3326.54 samples/sec Loss 4.3742 LearningRate 0.0272 Epoch: 9 Global Step: 118850 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:00:24,296-Speed 3320.68 samples/sec Loss 4.3705 LearningRate 0.0272 Epoch: 9 Global Step: 118860 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:00:27,391-Speed 3309.35 samples/sec Loss 4.3285 LearningRate 0.0272 Epoch: 9 Global Step: 118870 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:00:30,470-Speed 3327.29 samples/sec Loss 4.3303 LearningRate 0.0272 Epoch: 9 Global Step: 118880 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:00:33,565-Speed 3310.18 samples/sec Loss 4.3709 LearningRate 0.0272 Epoch: 9 Global Step: 118890 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:00:36,679-Speed 3288.62 samples/sec Loss 4.3671 LearningRate 0.0272 Epoch: 9 Global Step: 118900 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:00:39,831-Speed 3249.79 samples/sec Loss 4.3808 LearningRate 0.0272 Epoch: 9 Global Step: 118910 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:00:42,924-Speed 3311.83 samples/sec Loss 4.3184 LearningRate 0.0272 Epoch: 9 Global Step: 118920 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:00:46,000-Speed 3330.34 samples/sec Loss 4.3119 LearningRate 0.0272 Epoch: 9 Global Step: 118930 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:00:49,156-Speed 3245.66 samples/sec Loss 4.3107 LearningRate 0.0272 Epoch: 9 Global Step: 118940 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:00:52,246-Speed 3314.25 samples/sec Loss 4.3852 LearningRate 0.0272 Epoch: 9 Global Step: 118950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:00:55,393-Speed 3255.22 samples/sec Loss 4.3350 LearningRate 0.0272 Epoch: 9 Global Step: 118960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:00:58,455-Speed 3345.32 samples/sec Loss 4.3996 LearningRate 0.0272 Epoch: 9 Global Step: 118970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:01,571-Speed 3287.21 samples/sec Loss 4.3710 LearningRate 0.0271 Epoch: 9 Global Step: 118980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:04,668-Speed 3307.93 samples/sec Loss 4.2645 LearningRate 0.0271 Epoch: 9 Global Step: 118990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:07,748-Speed 3325.65 samples/sec Loss 4.3422 LearningRate 0.0271 Epoch: 9 Global Step: 119000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:10,836-Speed 3316.89 samples/sec Loss 4.3782 LearningRate 0.0271 Epoch: 9 Global Step: 119010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:13,922-Speed 3318.54 samples/sec Loss 4.3662 LearningRate 0.0271 Epoch: 9 Global Step: 119020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:17,052-Speed 3273.29 samples/sec Loss 4.3803 LearningRate 0.0271 Epoch: 9 Global Step: 119030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:20,131-Speed 3326.68 samples/sec Loss 4.3613 LearningRate 0.0271 Epoch: 9 Global Step: 119040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:23,293-Speed 3239.93 samples/sec Loss 4.3226 LearningRate 0.0271 Epoch: 9 Global Step: 119050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:01:26,373-Speed 3324.91 samples/sec Loss 4.3689 LearningRate 0.0271 Epoch: 9 Global Step: 119060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:29,462-Speed 3316.22 samples/sec Loss 4.3040 LearningRate 0.0271 Epoch: 9 Global Step: 119070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:32,575-Speed 3291.05 samples/sec Loss 4.4025 LearningRate 0.0271 Epoch: 9 Global Step: 119080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:35,661-Speed 3318.77 samples/sec Loss 4.3796 LearningRate 0.0271 Epoch: 9 Global Step: 119090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:38,768-Speed 3297.47 samples/sec Loss 4.4553 LearningRate 0.0271 Epoch: 9 Global Step: 119100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:41,858-Speed 3314.58 samples/sec Loss 4.3716 LearningRate 0.0271 Epoch: 9 Global Step: 119110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:44,959-Speed 3303.50 samples/sec Loss 4.3840 LearningRate 0.0271 Epoch: 9 Global Step: 119120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:48,083-Speed 3278.99 samples/sec Loss 4.4319 LearningRate 0.0271 Epoch: 9 Global Step: 119130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:51,215-Speed 3270.06 samples/sec Loss 4.4317 LearningRate 0.0271 Epoch: 9 Global Step: 119140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:54,305-Speed 3314.77 samples/sec Loss 4.3341 LearningRate 0.0271 Epoch: 9 Global Step: 119150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:01:57,362-Speed 3351.28 samples/sec Loss 4.3114 LearningRate 0.0271 Epoch: 9 Global Step: 119160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:00,494-Speed 3270.56 samples/sec Loss 4.3709 LearningRate 0.0271 Epoch: 9 Global Step: 119170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:03,626-Speed 3270.52 samples/sec Loss 4.3728 LearningRate 0.0271 Epoch: 9 Global Step: 119180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:06,719-Speed 3311.56 samples/sec Loss 4.4833 LearningRate 0.0271 Epoch: 9 Global Step: 119190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:09,808-Speed 3316.42 samples/sec Loss 4.3127 LearningRate 0.0271 Epoch: 9 Global Step: 119200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:12,882-Speed 3331.94 samples/sec Loss 4.4318 LearningRate 0.0271 Epoch: 9 Global Step: 119210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:15,964-Speed 3323.28 samples/sec Loss 4.2861 LearningRate 0.0270 Epoch: 9 Global Step: 119220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:19,059-Speed 3309.73 samples/sec Loss 4.3693 LearningRate 0.0270 Epoch: 9 Global Step: 119230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:22,126-Speed 3339.74 samples/sec Loss 4.3258 LearningRate 0.0270 Epoch: 9 Global Step: 119240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:25,295-Speed 3232.49 samples/sec Loss 4.3038 LearningRate 0.0270 Epoch: 9 Global Step: 119250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:28,467-Speed 3229.68 samples/sec Loss 4.4249 LearningRate 0.0270 Epoch: 9 Global Step: 119260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:02:31,620-Speed 3248.25 samples/sec Loss 4.4039 LearningRate 0.0270 Epoch: 9 Global Step: 119270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:34,736-Speed 3287.64 samples/sec Loss 4.3931 LearningRate 0.0270 Epoch: 9 Global Step: 119280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:37,867-Speed 3271.20 samples/sec Loss 4.3793 LearningRate 0.0270 Epoch: 9 Global Step: 119290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:41,057-Speed 3211.82 samples/sec Loss 4.4270 LearningRate 0.0270 Epoch: 9 Global Step: 119300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:44,163-Speed 3297.55 samples/sec Loss 4.4089 LearningRate 0.0270 Epoch: 9 Global Step: 119310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:47,269-Speed 3297.22 samples/sec Loss 4.3134 LearningRate 0.0270 Epoch: 9 Global Step: 119320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:50,427-Speed 3244.10 samples/sec Loss 4.3996 LearningRate 0.0270 Epoch: 9 Global Step: 119330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:53,580-Speed 3248.85 samples/sec Loss 4.3320 LearningRate 0.0270 Epoch: 9 Global Step: 119340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:56,659-Speed 3327.38 samples/sec Loss 4.3724 LearningRate 0.0270 Epoch: 9 Global Step: 119350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:02:59,790-Speed 3271.15 samples/sec Loss 4.4727 LearningRate 0.0270 Epoch: 9 Global Step: 119360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:02,903-Speed 3291.01 samples/sec Loss 4.4656 LearningRate 0.0270 Epoch: 9 Global Step: 119370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:03:05,957-Speed 3353.78 samples/sec Loss 4.3979 LearningRate 0.0270 Epoch: 9 Global Step: 119380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:03:09,014-Speed 3350.91 samples/sec Loss 4.4075 LearningRate 0.0270 Epoch: 9 Global Step: 119390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:03:12,088-Speed 3332.34 samples/sec Loss 4.4000 LearningRate 0.0270 Epoch: 9 Global Step: 119400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:15,207-Speed 3283.93 samples/sec Loss 4.4756 LearningRate 0.0270 Epoch: 9 Global Step: 119410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:18,338-Speed 3271.59 samples/sec Loss 4.3323 LearningRate 0.0270 Epoch: 9 Global Step: 119420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:21,413-Speed 3331.88 samples/sec Loss 4.3249 LearningRate 0.0270 Epoch: 9 Global Step: 119430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:24,560-Speed 3254.81 samples/sec Loss 4.2967 LearningRate 0.0270 Epoch: 9 Global Step: 119440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:27,658-Speed 3306.10 samples/sec Loss 4.3824 LearningRate 0.0270 Epoch: 9 Global Step: 119450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:30,757-Speed 3305.56 samples/sec Loss 4.3259 LearningRate 0.0269 Epoch: 9 Global Step: 119460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:33,887-Speed 3273.06 samples/sec Loss 4.3973 LearningRate 0.0269 Epoch: 9 Global Step: 119470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:36,998-Speed 3292.76 samples/sec Loss 4.4657 LearningRate 0.0269 Epoch: 9 Global Step: 119480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:40,081-Speed 3322.60 samples/sec Loss 4.4176 LearningRate 0.0269 Epoch: 9 Global Step: 119490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:43,157-Speed 3329.97 samples/sec Loss 4.3787 LearningRate 0.0269 Epoch: 9 Global Step: 119500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:03:46,252-Speed 3309.25 samples/sec Loss 4.2959 LearningRate 0.0269 Epoch: 9 Global Step: 119510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:49,431-Speed 3222.53 samples/sec Loss 4.3571 LearningRate 0.0269 Epoch: 9 Global Step: 119520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:52,598-Speed 3234.17 samples/sec Loss 4.2995 LearningRate 0.0269 Epoch: 9 Global Step: 119530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:55,754-Speed 3245.50 samples/sec Loss 4.3501 LearningRate 0.0269 Epoch: 9 Global Step: 119540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:03:58,806-Speed 3355.98 samples/sec Loss 4.4159 LearningRate 0.0269 Epoch: 9 Global Step: 119550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:01,960-Speed 3247.94 samples/sec Loss 4.4370 LearningRate 0.0269 Epoch: 9 Global Step: 119560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:05,104-Speed 3258.32 samples/sec Loss 4.3307 LearningRate 0.0269 Epoch: 9 Global Step: 119570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:08,257-Speed 3248.71 samples/sec Loss 4.4418 LearningRate 0.0269 Epoch: 9 Global Step: 119580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:11,370-Speed 3290.23 samples/sec Loss 4.4782 LearningRate 0.0269 Epoch: 9 Global Step: 119590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:14,427-Speed 3351.00 samples/sec Loss 4.3796 LearningRate 0.0269 Epoch: 9 Global Step: 119600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:17,500-Speed 3333.53 samples/sec Loss 4.3742 LearningRate 0.0269 Epoch: 9 Global Step: 119610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:04:20,621-Speed 3282.04 samples/sec Loss 4.3971 LearningRate 0.0269 Epoch: 9 Global Step: 119620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:04:23,725-Speed 3298.94 samples/sec Loss 4.3921 LearningRate 0.0269 Epoch: 9 Global Step: 119630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:26,815-Speed 3315.75 samples/sec Loss 4.3269 LearningRate 0.0269 Epoch: 9 Global Step: 119640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:29,875-Speed 3346.78 samples/sec Loss 4.5368 LearningRate 0.0269 Epoch: 9 Global Step: 119650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:32,930-Speed 3353.88 samples/sec Loss 4.3851 LearningRate 0.0269 Epoch: 9 Global Step: 119660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:35,995-Speed 3341.51 samples/sec Loss 4.4353 LearningRate 0.0269 Epoch: 9 Global Step: 119670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:39,066-Speed 3336.10 samples/sec Loss 4.5071 LearningRate 0.0269 Epoch: 9 Global Step: 119680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:42,227-Speed 3239.48 samples/sec Loss 4.3888 LearningRate 0.0269 Epoch: 9 Global Step: 119690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:04:45,306-Speed 3327.75 samples/sec Loss 4.4363 LearningRate 0.0268 Epoch: 9 Global Step: 119700 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:04:48,385-Speed 3326.50 samples/sec Loss 4.4225 LearningRate 0.0268 Epoch: 9 Global Step: 119710 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:04:51,437-Speed 3356.32 samples/sec Loss 4.2982 LearningRate 0.0268 Epoch: 9 Global Step: 119720 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:04:54,524-Speed 3318.36 samples/sec Loss 4.4509 LearningRate 0.0268 Epoch: 9 Global Step: 119730 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:04:57,573-Speed 3360.16 samples/sec Loss 4.5080 LearningRate 0.0268 Epoch: 9 Global Step: 119740 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:05:00,652-Speed 3326.15 samples/sec Loss 4.3704 LearningRate 0.0268 Epoch: 9 Global Step: 119750 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:05:03,802-Speed 3252.82 samples/sec Loss 4.4492 LearningRate 0.0268 Epoch: 9 Global Step: 119760 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:05:06,957-Speed 3246.62 samples/sec Loss 4.4769 LearningRate 0.0268 Epoch: 9 Global Step: 119770 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:05:10,088-Speed 3271.47 samples/sec Loss 4.3597 LearningRate 0.0268 Epoch: 9 Global Step: 119780 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:05:13,242-Speed 3248.12 samples/sec Loss 4.4306 LearningRate 0.0268 Epoch: 9 Global Step: 119790 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:05:16,336-Speed 3310.96 samples/sec Loss 4.3990 LearningRate 0.0268 Epoch: 9 Global Step: 119800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:19,397-Speed 3345.96 samples/sec Loss 4.5161 LearningRate 0.0268 Epoch: 9 Global Step: 119810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:22,508-Speed 3292.87 samples/sec Loss 4.3783 LearningRate 0.0268 Epoch: 9 Global Step: 119820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:25,720-Speed 3188.71 samples/sec Loss 4.3304 LearningRate 0.0268 Epoch: 9 Global Step: 119830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:28,800-Speed 3325.49 samples/sec Loss 4.4739 LearningRate 0.0268 Epoch: 9 Global Step: 119840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:31,906-Speed 3298.45 samples/sec Loss 4.3194 LearningRate 0.0268 Epoch: 9 Global Step: 119850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:34,991-Speed 3320.77 samples/sec Loss 4.3875 LearningRate 0.0268 Epoch: 9 Global Step: 119860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:38,048-Speed 3349.92 samples/sec Loss 4.4259 LearningRate 0.0268 Epoch: 9 Global Step: 119870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:41,104-Speed 3352.01 samples/sec Loss 4.4204 LearningRate 0.0268 Epoch: 9 Global Step: 119880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:44,165-Speed 3346.21 samples/sec Loss 4.3753 LearningRate 0.0268 Epoch: 9 Global Step: 119890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:47,233-Speed 3339.56 samples/sec Loss 4.3215 LearningRate 0.0268 Epoch: 9 Global Step: 119900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:50,373-Speed 3261.86 samples/sec Loss 4.3723 LearningRate 0.0268 Epoch: 9 Global Step: 119910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:53,532-Speed 3242.84 samples/sec Loss 4.3701 LearningRate 0.0268 Epoch: 9 Global Step: 119920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:56,649-Speed 3285.93 samples/sec Loss 4.2759 LearningRate 0.0268 Epoch: 9 Global Step: 119930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:05:59,811-Speed 3239.57 samples/sec Loss 4.3373 LearningRate 0.0267 Epoch: 9 Global Step: 119940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:02,917-Speed 3298.34 samples/sec Loss 4.4442 LearningRate 0.0267 Epoch: 9 Global Step: 119950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:06,056-Speed 3262.53 samples/sec Loss 4.4057 LearningRate 0.0267 Epoch: 9 Global Step: 119960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:09,155-Speed 3304.83 samples/sec Loss 4.3688 LearningRate 0.0267 Epoch: 9 Global Step: 119970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:12,302-Speed 3255.52 samples/sec Loss 4.4042 LearningRate 0.0267 Epoch: 9 Global Step: 119980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:15,441-Speed 3263.24 samples/sec Loss 4.4053 LearningRate 0.0267 Epoch: 9 Global Step: 119990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:18,606-Speed 3236.86 samples/sec Loss 4.3431 LearningRate 0.0267 Epoch: 9 Global Step: 120000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:06:21,713-Speed 3296.10 samples/sec Loss 4.3553 LearningRate 0.0267 Epoch: 9 Global Step: 120010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:06:24,762-Speed 3359.91 samples/sec Loss 4.4078 LearningRate 0.0267 Epoch: 9 Global Step: 120020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:27,829-Speed 3339.87 samples/sec Loss 4.3810 LearningRate 0.0267 Epoch: 9 Global Step: 120030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:30,948-Speed 3284.54 samples/sec Loss 4.3073 LearningRate 0.0267 Epoch: 9 Global Step: 120040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:34,033-Speed 3320.23 samples/sec Loss 4.4848 LearningRate 0.0267 Epoch: 9 Global Step: 120050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:37,126-Speed 3311.64 samples/sec Loss 4.3942 LearningRate 0.0267 Epoch: 9 Global Step: 120060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:40,277-Speed 3250.32 samples/sec Loss 4.3547 LearningRate 0.0267 Epoch: 9 Global Step: 120070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:43,435-Speed 3243.87 samples/sec Loss 4.3332 LearningRate 0.0267 Epoch: 9 Global Step: 120080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:46,509-Speed 3332.69 samples/sec Loss 4.3154 LearningRate 0.0267 Epoch: 9 Global Step: 120090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:49,574-Speed 3341.47 samples/sec Loss 4.4154 LearningRate 0.0267 Epoch: 9 Global Step: 120100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:52,683-Speed 3295.39 samples/sec Loss 4.4589 LearningRate 0.0267 Epoch: 9 Global Step: 120110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:06:55,818-Speed 3267.31 samples/sec Loss 4.4304 LearningRate 0.0267 Epoch: 9 Global Step: 120120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:06:58,905-Speed 3318.18 samples/sec Loss 4.3609 LearningRate 0.0267 Epoch: 9 Global Step: 120130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:07:01,987-Speed 3324.01 samples/sec Loss 4.4789 LearningRate 0.0267 Epoch: 9 Global Step: 120140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:07:05,060-Speed 3332.56 samples/sec Loss 4.3942 LearningRate 0.0267 Epoch: 9 Global Step: 120150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:07:08,123-Speed 3344.42 samples/sec Loss 4.4992 LearningRate 0.0267 Epoch: 9 Global Step: 120160 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:07:11,198-Speed 3331.14 samples/sec Loss 4.4158 LearningRate 0.0267 Epoch: 9 Global Step: 120170 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:07:14,346-Speed 3254.36 samples/sec Loss 4.3473 LearningRate 0.0266 Epoch: 9 Global Step: 120180 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:07:17,471-Speed 3277.59 samples/sec Loss 4.3496 LearningRate 0.0266 Epoch: 9 Global Step: 120190 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:07:20,606-Speed 3267.35 samples/sec Loss 4.3794 LearningRate 0.0266 Epoch: 9 Global Step: 120200 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:07:23,721-Speed 3288.35 samples/sec Loss 4.3962 LearningRate 0.0266 Epoch: 9 Global Step: 120210 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:07:26,887-Speed 3235.41 samples/sec Loss 4.3294 LearningRate 0.0266 Epoch: 9 Global Step: 120220 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:07:30,022-Speed 3267.12 samples/sec Loss 4.4649 LearningRate 0.0266 Epoch: 9 Global Step: 120230 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:07:33,138-Speed 3287.72 samples/sec Loss 4.5289 LearningRate 0.0266 Epoch: 9 Global Step: 120240 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:07:36,258-Speed 3283.31 samples/sec Loss 4.3415 LearningRate 0.0266 Epoch: 9 Global Step: 120250 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:07:39,353-Speed 3308.94 samples/sec Loss 4.3547 LearningRate 0.0266 Epoch: 9 Global Step: 120260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:07:42,510-Speed 3244.35 samples/sec Loss 4.3852 LearningRate 0.0266 Epoch: 9 Global Step: 120270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:07:45,582-Speed 3334.87 samples/sec Loss 4.4653 LearningRate 0.0266 Epoch: 9 Global Step: 120280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:07:48,711-Speed 3274.41 samples/sec Loss 4.3391 LearningRate 0.0266 Epoch: 9 Global Step: 120290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:07:51,797-Speed 3319.21 samples/sec Loss 4.4226 LearningRate 0.0266 Epoch: 9 Global Step: 120300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:07:54,874-Speed 3329.11 samples/sec Loss 4.4107 LearningRate 0.0266 Epoch: 9 Global Step: 120310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:07:57,959-Speed 3319.98 samples/sec Loss 4.3316 LearningRate 0.0266 Epoch: 9 Global Step: 120320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:01,071-Speed 3290.81 samples/sec Loss 4.3836 LearningRate 0.0266 Epoch: 9 Global Step: 120330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:04,243-Speed 3229.68 samples/sec Loss 4.3469 LearningRate 0.0266 Epoch: 9 Global Step: 120340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:07,307-Speed 3343.51 samples/sec Loss 4.3121 LearningRate 0.0266 Epoch: 9 Global Step: 120350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:10,366-Speed 3347.85 samples/sec Loss 4.4397 LearningRate 0.0266 Epoch: 9 Global Step: 120360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:08:13,536-Speed 3231.98 samples/sec Loss 4.3774 LearningRate 0.0266 Epoch: 9 Global Step: 120370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:08:16,685-Speed 3252.49 samples/sec Loss 4.3274 LearningRate 0.0266 Epoch: 9 Global Step: 120380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:08:19,777-Speed 3313.60 samples/sec Loss 4.3997 LearningRate 0.0266 Epoch: 9 Global Step: 120390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:08:22,864-Speed 3318.35 samples/sec Loss 4.3873 LearningRate 0.0266 Epoch: 9 Global Step: 120400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:08:25,970-Speed 3297.19 samples/sec Loss 4.3800 LearningRate 0.0266 Epoch: 9 Global Step: 120410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:08:29,025-Speed 3353.35 samples/sec Loss 4.4255 LearningRate 0.0265 Epoch: 9 Global Step: 120420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:08:32,064-Speed 3371.01 samples/sec Loss 4.4254 LearningRate 0.0265 Epoch: 9 Global Step: 120430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:35,129-Speed 3342.08 samples/sec Loss 4.4889 LearningRate 0.0265 Epoch: 9 Global Step: 120440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:38,246-Speed 3286.28 samples/sec Loss 4.3893 LearningRate 0.0265 Epoch: 9 Global Step: 120450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:41,408-Speed 3239.04 samples/sec Loss 4.3974 LearningRate 0.0265 Epoch: 9 Global Step: 120460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:44,498-Speed 3315.38 samples/sec Loss 4.4131 LearningRate 0.0265 Epoch: 9 Global Step: 120470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:47,579-Speed 3324.32 samples/sec Loss 4.4246 LearningRate 0.0265 Epoch: 9 Global Step: 120480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:50,631-Speed 3356.68 samples/sec Loss 4.4260 LearningRate 0.0265 Epoch: 9 Global Step: 120490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:53,758-Speed 3275.39 samples/sec Loss 4.4513 LearningRate 0.0265 Epoch: 9 Global Step: 120500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:08:56,813-Speed 3352.33 samples/sec Loss 4.4631 LearningRate 0.0265 Epoch: 9 Global Step: 120510 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:08:59,871-Speed 3350.63 samples/sec Loss 4.4541 LearningRate 0.0265 Epoch: 9 Global Step: 120520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:02,938-Speed 3339.87 samples/sec Loss 4.5345 LearningRate 0.0265 Epoch: 9 Global Step: 120530 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:06,068-Speed 3272.88 samples/sec Loss 4.3634 LearningRate 0.0265 Epoch: 9 Global Step: 120540 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:09,132-Speed 3342.99 samples/sec Loss 4.3859 LearningRate 0.0265 Epoch: 9 Global Step: 120550 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:12,242-Speed 3293.52 samples/sec Loss 4.4061 LearningRate 0.0265 Epoch: 9 Global Step: 120560 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:15,379-Speed 3266.10 samples/sec Loss 4.2752 LearningRate 0.0265 Epoch: 9 Global Step: 120570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:18,574-Speed 3205.88 samples/sec Loss 4.4117 LearningRate 0.0265 Epoch: 9 Global Step: 120580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:21,634-Speed 3346.86 samples/sec Loss 4.3452 LearningRate 0.0265 Epoch: 9 Global Step: 120590 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:24,732-Speed 3306.18 samples/sec Loss 4.3453 LearningRate 0.0265 Epoch: 9 Global Step: 120600 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:27,825-Speed 3312.14 samples/sec Loss 4.4563 LearningRate 0.0265 Epoch: 9 Global Step: 120610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:09:30,982-Speed 3244.35 samples/sec Loss 4.3646 LearningRate 0.0265 Epoch: 9 Global Step: 120620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:09:34,060-Speed 3327.57 samples/sec Loss 4.3579 LearningRate 0.0265 Epoch: 9 Global Step: 120630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:09:37,120-Speed 3348.11 samples/sec Loss 4.5252 LearningRate 0.0265 Epoch: 9 Global Step: 120640 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:40,197-Speed 3328.34 samples/sec Loss 4.3865 LearningRate 0.0265 Epoch: 9 Global Step: 120650 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:43,321-Speed 3279.24 samples/sec Loss 4.3776 LearningRate 0.0264 Epoch: 9 Global Step: 120660 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:46,410-Speed 3316.59 samples/sec Loss 4.3809 LearningRate 0.0264 Epoch: 9 Global Step: 120670 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:49,522-Speed 3291.42 samples/sec Loss 4.3053 LearningRate 0.0264 Epoch: 9 Global Step: 120680 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:52,706-Speed 3217.34 samples/sec Loss 4.4005 LearningRate 0.0264 Epoch: 9 Global Step: 120690 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:55,821-Speed 3288.33 samples/sec Loss 4.4633 LearningRate 0.0264 Epoch: 9 Global Step: 120700 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:09:58,921-Speed 3303.92 samples/sec Loss 4.3545 LearningRate 0.0264 Epoch: 9 Global Step: 120710 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:02,008-Speed 3318.87 samples/sec Loss 4.4791 LearningRate 0.0264 Epoch: 9 Global Step: 120720 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:05,097-Speed 3315.36 samples/sec Loss 4.4331 LearningRate 0.0264 Epoch: 9 Global Step: 120730 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:08,236-Speed 3262.98 samples/sec Loss 4.4210 LearningRate 0.0264 Epoch: 9 Global Step: 120740 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:11,310-Speed 3332.11 samples/sec Loss 4.4257 LearningRate 0.0264 Epoch: 9 Global Step: 120750 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:14,375-Speed 3342.51 samples/sec Loss 4.3699 LearningRate 0.0264 Epoch: 9 Global Step: 120760 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:17,437-Speed 3345.21 samples/sec Loss 4.4056 LearningRate 0.0264 Epoch: 9 Global Step: 120770 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:20,520-Speed 3323.13 samples/sec Loss 4.3791 LearningRate 0.0264 Epoch: 9 Global Step: 120780 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:23,637-Speed 3285.39 samples/sec Loss 4.4144 LearningRate 0.0264 Epoch: 9 Global Step: 120790 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:26,756-Speed 3284.54 samples/sec Loss 4.3760 LearningRate 0.0264 Epoch: 9 Global Step: 120800 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:29,834-Speed 3327.62 samples/sec Loss 4.3746 LearningRate 0.0264 Epoch: 9 Global Step: 120810 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:32,910-Speed 3330.39 samples/sec Loss 4.4294 LearningRate 0.0264 Epoch: 9 Global Step: 120820 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:35,975-Speed 3342.97 samples/sec Loss 4.4598 LearningRate 0.0264 Epoch: 9 Global Step: 120830 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:39,048-Speed 3332.37 samples/sec Loss 4.4450 LearningRate 0.0264 Epoch: 9 Global Step: 120840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:10:42,098-Speed 3359.20 samples/sec Loss 4.4389 LearningRate 0.0264 Epoch: 9 Global Step: 120850 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:45,170-Speed 3333.62 samples/sec Loss 4.3705 LearningRate 0.0264 Epoch: 9 Global Step: 120860 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:48,248-Speed 3327.94 samples/sec Loss 4.3620 LearningRate 0.0264 Epoch: 9 Global Step: 120870 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:51,314-Speed 3341.50 samples/sec Loss 4.4798 LearningRate 0.0264 Epoch: 9 Global Step: 120880 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:54,368-Speed 3354.14 samples/sec Loss 4.4204 LearningRate 0.0264 Epoch: 9 Global Step: 120890 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:10:57,452-Speed 3321.49 samples/sec Loss 4.4332 LearningRate 0.0264 Epoch: 9 Global Step: 120900 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:11:00,505-Speed 3354.94 samples/sec Loss 4.2844 LearningRate 0.0263 Epoch: 9 Global Step: 120910 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:11:03,564-Speed 3348.93 samples/sec Loss 4.4118 LearningRate 0.0263 Epoch: 9 Global Step: 120920 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:11:06,673-Speed 3296.32 samples/sec Loss 4.4750 LearningRate 0.0263 Epoch: 9 Global Step: 120930 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:11:09,732-Speed 3348.30 samples/sec Loss 4.4227 LearningRate 0.0263 Epoch: 9 Global Step: 120940 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:11:12,824-Speed 3312.95 samples/sec Loss 4.4192 LearningRate 0.0263 Epoch: 9 Global Step: 120950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:11:15,917-Speed 3311.37 samples/sec Loss 4.4060 LearningRate 0.0263 Epoch: 9 Global Step: 120960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:11:19,052-Speed 3267.44 samples/sec Loss 4.3740 LearningRate 0.0263 Epoch: 9 Global Step: 120970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:11:22,106-Speed 3354.60 samples/sec Loss 4.3845 LearningRate 0.0263 Epoch: 9 Global Step: 120980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:11:25,282-Speed 3224.82 samples/sec Loss 4.4285 LearningRate 0.0263 Epoch: 9 Global Step: 120990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:11:28,478-Speed 3205.06 samples/sec Loss 4.4246 LearningRate 0.0263 Epoch: 9 Global Step: 121000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:11:31,558-Speed 3325.68 samples/sec Loss 4.3244 LearningRate 0.0263 Epoch: 9 Global Step: 121010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:11:34,651-Speed 3312.47 samples/sec Loss 4.3902 LearningRate 0.0263 Epoch: 9 Global Step: 121020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:11:37,809-Speed 3243.71 samples/sec Loss 4.3678 LearningRate 0.0263 Epoch: 9 Global Step: 121030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:11:40,931-Speed 3280.41 samples/sec Loss 4.4785 LearningRate 0.0263 Epoch: 9 Global Step: 121040 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:11:44,089-Speed 3244.38 samples/sec Loss 4.3700 LearningRate 0.0263 Epoch: 9 Global Step: 121050 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:11:47,198-Speed 3294.07 samples/sec Loss 4.3708 LearningRate 0.0263 Epoch: 9 Global Step: 121060 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:11:50,326-Speed 3275.10 samples/sec Loss 4.4480 LearningRate 0.0263 Epoch: 9 Global Step: 121070 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:11:53,457-Speed 3270.92 samples/sec Loss 4.5173 LearningRate 0.0263 Epoch: 9 Global Step: 121080 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:11:56,519-Speed 3346.05 samples/sec Loss 4.4340 LearningRate 0.0263 Epoch: 9 Global Step: 121090 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:11:59,662-Speed 3258.36 samples/sec Loss 4.4525 LearningRate 0.0263 Epoch: 9 Global Step: 121100 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:12:02,793-Speed 3271.20 samples/sec Loss 4.4788 LearningRate 0.0263 Epoch: 9 Global Step: 121110 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:12:05,923-Speed 3272.78 samples/sec Loss 4.4540 LearningRate 0.0263 Epoch: 9 Global Step: 121120 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:12:09,011-Speed 3317.26 samples/sec Loss 4.3507 LearningRate 0.0263 Epoch: 9 Global Step: 121130 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:12:12,131-Speed 3283.01 samples/sec Loss 4.3721 LearningRate 0.0263 Epoch: 9 Global Step: 121140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:12:15,222-Speed 3314.31 samples/sec Loss 4.4170 LearningRate 0.0262 Epoch: 9 Global Step: 121150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:12:18,388-Speed 3235.27 samples/sec Loss 4.3611 LearningRate 0.0262 Epoch: 9 Global Step: 121160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:12:21,504-Speed 3288.00 samples/sec Loss 4.3967 LearningRate 0.0262 Epoch: 9 Global Step: 121170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:12:24,579-Speed 3331.13 samples/sec Loss 4.3796 LearningRate 0.0262 Epoch: 9 Global Step: 121180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:12:27,678-Speed 3304.98 samples/sec Loss 4.4102 LearningRate 0.0262 Epoch: 9 Global Step: 121190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:12:30,781-Speed 3300.81 samples/sec Loss 4.4500 LearningRate 0.0262 Epoch: 9 Global Step: 121200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:12:33,855-Speed 3332.39 samples/sec Loss 4.3846 LearningRate 0.0262 Epoch: 9 Global Step: 121210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:12:36,952-Speed 3307.53 samples/sec Loss 4.4471 LearningRate 0.0262 Epoch: 9 Global Step: 121220 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:12:40,026-Speed 3332.04 samples/sec Loss 4.3082 LearningRate 0.0262 Epoch: 9 Global Step: 121230 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:12:43,132-Speed 3298.24 samples/sec Loss 4.3674 LearningRate 0.0262 Epoch: 9 Global Step: 121240 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:12:46,225-Speed 3312.06 samples/sec Loss 4.5021 LearningRate 0.0262 Epoch: 9 Global Step: 121250 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:12:49,311-Speed 3319.59 samples/sec Loss 4.5222 LearningRate 0.0262 Epoch: 9 Global Step: 121260 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:12:52,442-Speed 3270.65 samples/sec Loss 4.4077 LearningRate 0.0262 Epoch: 9 Global Step: 121270 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:12:55,563-Speed 3283.06 samples/sec Loss 4.4270 LearningRate 0.0262 Epoch: 9 Global Step: 121280 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:12:58,651-Speed 3316.88 samples/sec Loss 4.3825 LearningRate 0.0262 Epoch: 9 Global Step: 121290 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:13:01,772-Speed 3281.95 samples/sec Loss 4.4568 LearningRate 0.0262 Epoch: 9 Global Step: 121300 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:13:04,890-Speed 3285.01 samples/sec Loss 4.4549 LearningRate 0.0262 Epoch: 9 Global Step: 121310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:13:07,994-Speed 3300.51 samples/sec Loss 4.4299 LearningRate 0.0262 Epoch: 9 Global Step: 121320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:11,051-Speed 3350.85 samples/sec Loss 4.4021 LearningRate 0.0262 Epoch: 9 Global Step: 121330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:14,131-Speed 3325.42 samples/sec Loss 4.5201 LearningRate 0.0262 Epoch: 9 Global Step: 121340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:17,240-Speed 3294.30 samples/sec Loss 4.3217 LearningRate 0.0262 Epoch: 9 Global Step: 121350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:20,302-Speed 3344.98 samples/sec Loss 4.4819 LearningRate 0.0262 Epoch: 9 Global Step: 121360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:23,375-Speed 3333.92 samples/sec Loss 4.4690 LearningRate 0.0262 Epoch: 9 Global Step: 121370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:26,447-Speed 3334.75 samples/sec Loss 4.4139 LearningRate 0.0262 Epoch: 9 Global Step: 121380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:29,591-Speed 3257.27 samples/sec Loss 4.4319 LearningRate 0.0261 Epoch: 9 Global Step: 121390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:32,680-Speed 3317.15 samples/sec Loss 4.4774 LearningRate 0.0261 Epoch: 9 Global Step: 121400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:35,732-Speed 3355.32 samples/sec Loss 4.4164 LearningRate 0.0261 Epoch: 9 Global Step: 121410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:38,825-Speed 3312.16 samples/sec Loss 4.4438 LearningRate 0.0261 Epoch: 9 Global Step: 121420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:13:41,946-Speed 3281.86 samples/sec Loss 4.4327 LearningRate 0.0261 Epoch: 9 Global Step: 121430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:45,016-Speed 3337.37 samples/sec Loss 4.3535 LearningRate 0.0261 Epoch: 9 Global Step: 121440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:13:48,138-Speed 3280.29 samples/sec Loss 4.4567 LearningRate 0.0261 Epoch: 9 Global Step: 121450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:13:51,213-Speed 3331.42 samples/sec Loss 4.4192 LearningRate 0.0261 Epoch: 9 Global Step: 121460 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:13:54,343-Speed 3272.38 samples/sec Loss 4.4051 LearningRate 0.0261 Epoch: 9 Global Step: 121470 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:13:57,429-Speed 3320.27 samples/sec Loss 4.3749 LearningRate 0.0261 Epoch: 9 Global Step: 121480 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:14:00,522-Speed 3311.66 samples/sec Loss 4.3817 LearningRate 0.0261 Epoch: 9 Global Step: 121490 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:14:03,599-Speed 3328.72 samples/sec Loss 4.4314 LearningRate 0.0261 Epoch: 9 Global Step: 121500 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:14:06,698-Speed 3305.52 samples/sec Loss 4.3198 LearningRate 0.0261 Epoch: 9 Global Step: 121510 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:14:09,779-Speed 3324.27 samples/sec Loss 4.3490 LearningRate 0.0261 Epoch: 9 Global Step: 121520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:14:12,915-Speed 3266.78 samples/sec Loss 4.3760 LearningRate 0.0261 Epoch: 9 Global Step: 121530 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:14:16,047-Speed 3271.03 samples/sec Loss 4.3718 LearningRate 0.0261 Epoch: 9 Global Step: 121540 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:14:19,142-Speed 3308.74 samples/sec Loss 4.4432 LearningRate 0.0261 Epoch: 9 Global Step: 121550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:22,194-Speed 3356.23 samples/sec Loss 4.4777 LearningRate 0.0261 Epoch: 9 Global Step: 121560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:25,367-Speed 3228.47 samples/sec Loss 4.4481 LearningRate 0.0261 Epoch: 9 Global Step: 121570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:28,520-Speed 3249.03 samples/sec Loss 4.3765 LearningRate 0.0261 Epoch: 9 Global Step: 121580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:31,682-Speed 3239.17 samples/sec Loss 4.4516 LearningRate 0.0261 Epoch: 9 Global Step: 121590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:34,802-Speed 3283.13 samples/sec Loss 4.3351 LearningRate 0.0261 Epoch: 9 Global Step: 121600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:37,940-Speed 3264.37 samples/sec Loss 4.3725 LearningRate 0.0261 Epoch: 9 Global Step: 121610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:41,036-Speed 3308.40 samples/sec Loss 4.3883 LearningRate 0.0261 Epoch: 9 Global Step: 121620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:44,125-Speed 3316.13 samples/sec Loss 4.4381 LearningRate 0.0260 Epoch: 9 Global Step: 121630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:47,278-Speed 3248.29 samples/sec Loss 4.3535 LearningRate 0.0260 Epoch: 9 Global Step: 121640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:50,467-Speed 3212.06 samples/sec Loss 4.4793 LearningRate 0.0260 Epoch: 9 Global Step: 121650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:14:53,571-Speed 3300.76 samples/sec Loss 4.3949 LearningRate 0.0260 Epoch: 9 Global Step: 121660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:56,644-Speed 3333.59 samples/sec Loss 4.3847 LearningRate 0.0260 Epoch: 9 Global Step: 121670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:14:59,733-Speed 3315.36 samples/sec Loss 4.3375 LearningRate 0.0260 Epoch: 9 Global Step: 121680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:15:02,875-Speed 3260.41 samples/sec Loss 4.3925 LearningRate 0.0260 Epoch: 9 Global Step: 121690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:15:06,006-Speed 3271.30 samples/sec Loss 4.3658 LearningRate 0.0260 Epoch: 9 Global Step: 121700 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:15:09,130-Speed 3279.13 samples/sec Loss 4.3809 LearningRate 0.0260 Epoch: 9 Global Step: 121710 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:15:12,241-Speed 3292.68 samples/sec Loss 4.2813 LearningRate 0.0260 Epoch: 9 Global Step: 121720 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:15:15,422-Speed 3220.07 samples/sec Loss 4.3065 LearningRate 0.0260 Epoch: 9 Global Step: 121730 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:15:18,576-Speed 3247.12 samples/sec Loss 4.4936 LearningRate 0.0260 Epoch: 9 Global Step: 121740 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:15:21,672-Speed 3308.96 samples/sec Loss 4.4343 LearningRate 0.0260 Epoch: 9 Global Step: 121750 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:15:24,765-Speed 3312.32 samples/sec Loss 4.3904 LearningRate 0.0260 Epoch: 9 Global Step: 121760 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:15:27,852-Speed 3318.61 samples/sec Loss 4.5077 LearningRate 0.0260 Epoch: 9 Global Step: 121770 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:15:31,009-Speed 3244.01 samples/sec Loss 4.4226 LearningRate 0.0260 Epoch: 9 Global Step: 121780 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:15:34,120-Speed 3292.61 samples/sec Loss 4.3900 LearningRate 0.0260 Epoch: 9 Global Step: 121790 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:15:37,239-Speed 3284.41 samples/sec Loss 4.3557 LearningRate 0.0260 Epoch: 9 Global Step: 121800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:15:40,393-Speed 3247.89 samples/sec Loss 4.4298 LearningRate 0.0260 Epoch: 9 Global Step: 121810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:15:43,586-Speed 3207.11 samples/sec Loss 4.4930 LearningRate 0.0260 Epoch: 9 Global Step: 121820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:15:46,732-Speed 3256.08 samples/sec Loss 4.4635 LearningRate 0.0260 Epoch: 9 Global Step: 121830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:15:49,893-Speed 3241.01 samples/sec Loss 4.3878 LearningRate 0.0260 Epoch: 9 Global Step: 121840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:15:53,024-Speed 3271.40 samples/sec Loss 4.4646 LearningRate 0.0260 Epoch: 9 Global Step: 121850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:15:56,134-Speed 3294.05 samples/sec Loss 4.4139 LearningRate 0.0260 Epoch: 9 Global Step: 121860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:15:59,247-Speed 3290.25 samples/sec Loss 4.3910 LearningRate 0.0260 Epoch: 9 Global Step: 121870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:02,370-Speed 3279.90 samples/sec Loss 4.3413 LearningRate 0.0259 Epoch: 9 Global Step: 121880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:05,455-Speed 3320.50 samples/sec Loss 4.3375 LearningRate 0.0259 Epoch: 9 Global Step: 121890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:08,534-Speed 3326.13 samples/sec Loss 4.4213 LearningRate 0.0259 Epoch: 9 Global Step: 121900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:11,632-Speed 3307.17 samples/sec Loss 4.4588 LearningRate 0.0259 Epoch: 9 Global Step: 121910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:14,742-Speed 3293.26 samples/sec Loss 4.3489 LearningRate 0.0259 Epoch: 9 Global Step: 121920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:17,830-Speed 3317.41 samples/sec Loss 4.4283 LearningRate 0.0259 Epoch: 9 Global Step: 121930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:20,948-Speed 3285.29 samples/sec Loss 4.4076 LearningRate 0.0259 Epoch: 9 Global Step: 121940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:24,027-Speed 3326.62 samples/sec Loss 4.4647 LearningRate 0.0259 Epoch: 9 Global Step: 121950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:27,166-Speed 3265.10 samples/sec Loss 4.3869 LearningRate 0.0259 Epoch: 9 Global Step: 121960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:30,361-Speed 3205.99 samples/sec Loss 4.3217 LearningRate 0.0259 Epoch: 9 Global Step: 121970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:33,415-Speed 3354.24 samples/sec Loss 4.4421 LearningRate 0.0259 Epoch: 9 Global Step: 121980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:36,523-Speed 3295.35 samples/sec Loss 4.5264 LearningRate 0.0259 Epoch: 9 Global Step: 121990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:39,669-Speed 3256.46 samples/sec Loss 4.3953 LearningRate 0.0259 Epoch: 9 Global Step: 122000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:16:42,767-Speed 3305.43 samples/sec Loss 4.4772 LearningRate 0.0259 Epoch: 9 Global Step: 122010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:45,852-Speed 3321.10 samples/sec Loss 4.3579 LearningRate 0.0259 Epoch: 9 Global Step: 122020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:49,028-Speed 3224.63 samples/sec Loss 4.3876 LearningRate 0.0259 Epoch: 9 Global Step: 122030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:52,209-Speed 3221.14 samples/sec Loss 4.4239 LearningRate 0.0259 Epoch: 9 Global Step: 122040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:55,341-Speed 3269.95 samples/sec Loss 4.3875 LearningRate 0.0259 Epoch: 9 Global Step: 122050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:16:58,417-Speed 3330.59 samples/sec Loss 4.4293 LearningRate 0.0259 Epoch: 9 Global Step: 122060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:01,477-Speed 3346.96 samples/sec Loss 4.4049 LearningRate 0.0259 Epoch: 9 Global Step: 122070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:04,587-Speed 3293.84 samples/sec Loss 4.4557 LearningRate 0.0259 Epoch: 9 Global Step: 122080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:07,692-Speed 3299.43 samples/sec Loss 4.3680 LearningRate 0.0259 Epoch: 9 Global Step: 122090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:10,785-Speed 3311.15 samples/sec Loss 4.4723 LearningRate 0.0259 Epoch: 9 Global Step: 122100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:13,837-Speed 3356.54 samples/sec Loss 4.4100 LearningRate 0.0259 Epoch: 9 Global Step: 122110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:17:16,890-Speed 3354.84 samples/sec Loss 4.4769 LearningRate 0.0258 Epoch: 9 Global Step: 122120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:17:19,961-Speed 3336.03 samples/sec Loss 4.3493 LearningRate 0.0258 Epoch: 9 Global Step: 122130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:17:23,043-Speed 3323.36 samples/sec Loss 4.3928 LearningRate 0.0258 Epoch: 9 Global Step: 122140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:26,196-Speed 3248.39 samples/sec Loss 4.4322 LearningRate 0.0258 Epoch: 9 Global Step: 122150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:29,304-Speed 3296.18 samples/sec Loss 4.4138 LearningRate 0.0258 Epoch: 9 Global Step: 122160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:32,429-Speed 3277.39 samples/sec Loss 4.3737 LearningRate 0.0258 Epoch: 9 Global Step: 122170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:35,564-Speed 3267.89 samples/sec Loss 4.3269 LearningRate 0.0258 Epoch: 9 Global Step: 122180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:38,682-Speed 3285.33 samples/sec Loss 4.4700 LearningRate 0.0258 Epoch: 9 Global Step: 122190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:41,834-Speed 3250.14 samples/sec Loss 4.3699 LearningRate 0.0258 Epoch: 9 Global Step: 122200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:44,903-Speed 3336.64 samples/sec Loss 4.3987 LearningRate 0.0258 Epoch: 9 Global Step: 122210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:47,995-Speed 3314.18 samples/sec Loss 4.4306 LearningRate 0.0258 Epoch: 9 Global Step: 122220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:51,095-Speed 3303.30 samples/sec Loss 4.3757 LearningRate 0.0258 Epoch: 9 Global Step: 122230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:17:54,153-Speed 3350.03 samples/sec Loss 4.4375 LearningRate 0.0258 Epoch: 9 Global Step: 122240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:17:57,265-Speed 3291.58 samples/sec Loss 4.4085 LearningRate 0.0258 Epoch: 9 Global Step: 122250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:18:00,324-Speed 3348.62 samples/sec Loss 4.3264 LearningRate 0.0258 Epoch: 9 Global Step: 122260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:18:03,413-Speed 3316.35 samples/sec Loss 4.3523 LearningRate 0.0258 Epoch: 9 Global Step: 122270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:18:06,617-Speed 3196.38 samples/sec Loss 4.4552 LearningRate 0.0258 Epoch: 9 Global Step: 122280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:18:09,727-Speed 3293.84 samples/sec Loss 4.3689 LearningRate 0.0258 Epoch: 9 Global Step: 122290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:18:13,014-Speed 3116.35 samples/sec Loss 4.4049 LearningRate 0.0258 Epoch: 9 Global Step: 122300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:18:16,120-Speed 3297.63 samples/sec Loss 4.3761 LearningRate 0.0258 Epoch: 9 Global Step: 122310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:18:19,236-Speed 3287.24 samples/sec Loss 4.4929 LearningRate 0.0258 Epoch: 9 Global Step: 122320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:18:22,296-Speed 3348.41 samples/sec Loss 4.3248 LearningRate 0.0258 Epoch: 9 Global Step: 122330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:18:25,468-Speed 3228.53 samples/sec Loss 4.4021 LearningRate 0.0258 Epoch: 9 Global Step: 122340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:18:28,661-Speed 3207.80 samples/sec Loss 4.4762 LearningRate 0.0258 Epoch: 9 Global Step: 122350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:18:31,751-Speed 3315.84 samples/sec Loss 4.5235 LearningRate 0.0258 Epoch: 9 Global Step: 122360 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:18:34,812-Speed 3346.10 samples/sec Loss 4.4550 LearningRate 0.0257 Epoch: 9 Global Step: 122370 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:18:37,930-Speed 3285.65 samples/sec Loss 4.4358 LearningRate 0.0257 Epoch: 9 Global Step: 122380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:18:40,989-Speed 3347.90 samples/sec Loss 4.4172 LearningRate 0.0257 Epoch: 9 Global Step: 122390 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:18:44,107-Speed 3284.69 samples/sec Loss 4.4089 LearningRate 0.0257 Epoch: 9 Global Step: 122400 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:18:47,183-Speed 3330.22 samples/sec Loss 4.3966 LearningRate 0.0257 Epoch: 9 Global Step: 122410 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:18:50,316-Speed 3269.87 samples/sec Loss 4.3759 LearningRate 0.0257 Epoch: 9 Global Step: 122420 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:18:53,399-Speed 3322.20 samples/sec Loss 4.3163 LearningRate 0.0257 Epoch: 9 Global Step: 122430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:18:56,511-Speed 3292.05 samples/sec Loss 4.3655 LearningRate 0.0257 Epoch: 9 Global Step: 122440 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:18:59,621-Speed 3293.91 samples/sec Loss 4.4194 LearningRate 0.0257 Epoch: 9 Global Step: 122450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:19:02,757-Speed 3266.23 samples/sec Loss 4.4169 LearningRate 0.0257 Epoch: 9 Global Step: 122460 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:19:05,920-Speed 3238.91 samples/sec Loss 4.4469 LearningRate 0.0257 Epoch: 9 Global Step: 122470 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:19:09,009-Speed 3315.73 samples/sec Loss 4.4762 LearningRate 0.0257 Epoch: 9 Global Step: 122480 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:19:12,141-Speed 3270.17 samples/sec Loss 4.3690 LearningRate 0.0257 Epoch: 9 Global Step: 122490 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:19:15,325-Speed 3216.72 samples/sec Loss 4.5022 LearningRate 0.0257 Epoch: 9 Global Step: 122500 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:19:18,430-Speed 3299.26 samples/sec Loss 4.4961 LearningRate 0.0257 Epoch: 9 Global Step: 122510 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:19:21,511-Speed 3324.37 samples/sec Loss 4.3980 LearningRate 0.0257 Epoch: 9 Global Step: 122520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:19:24,691-Speed 3220.86 samples/sec Loss 4.3826 LearningRate 0.0257 Epoch: 9 Global Step: 122530 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:19:27,882-Speed 3210.72 samples/sec Loss 4.3921 LearningRate 0.0257 Epoch: 9 Global Step: 122540 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:19:31,005-Speed 3280.35 samples/sec Loss 4.4047 LearningRate 0.0257 Epoch: 9 Global Step: 122550 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:19:34,096-Speed 3313.03 samples/sec Loss 4.4526 LearningRate 0.0257 Epoch: 9 Global Step: 122560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:19:37,231-Speed 3267.93 samples/sec Loss 4.5122 LearningRate 0.0257 Epoch: 9 Global Step: 122570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:19:40,417-Speed 3215.61 samples/sec Loss 4.4397 LearningRate 0.0257 Epoch: 9 Global Step: 122580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:19:43,623-Speed 3194.53 samples/sec Loss 4.4436 LearningRate 0.0257 Epoch: 9 Global Step: 122590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:19:46,760-Speed 3265.29 samples/sec Loss 4.3162 LearningRate 0.0257 Epoch: 9 Global Step: 122600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:19:49,911-Speed 3251.24 samples/sec Loss 4.3988 LearningRate 0.0256 Epoch: 9 Global Step: 122610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:19:53,090-Speed 3222.08 samples/sec Loss 4.4070 LearningRate 0.0256 Epoch: 9 Global Step: 122620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:19:56,199-Speed 3294.92 samples/sec Loss 4.4103 LearningRate 0.0256 Epoch: 9 Global Step: 122630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:19:59,271-Speed 3334.05 samples/sec Loss 4.4725 LearningRate 0.0256 Epoch: 9 Global Step: 122640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:20:02,390-Speed 3284.16 samples/sec Loss 4.4055 LearningRate 0.0256 Epoch: 9 Global Step: 122650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:20:05,469-Speed 3327.66 samples/sec Loss 4.4599 LearningRate 0.0256 Epoch: 9 Global Step: 122660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:20:08,615-Speed 3255.87 samples/sec Loss 4.4865 LearningRate 0.0256 Epoch: 9 Global Step: 122670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:20:11,764-Speed 3252.07 samples/sec Loss 4.3440 LearningRate 0.0256 Epoch: 9 Global Step: 122680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:20:14,992-Speed 3173.19 samples/sec Loss 4.3005 LearningRate 0.0256 Epoch: 9 Global Step: 122690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:20:18,159-Speed 3234.45 samples/sec Loss 4.5222 LearningRate 0.0256 Epoch: 9 Global Step: 122700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:20:21,243-Speed 3322.04 samples/sec Loss 4.4077 LearningRate 0.0256 Epoch: 9 Global Step: 122710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:20:24,335-Speed 3313.03 samples/sec Loss 4.3620 LearningRate 0.0256 Epoch: 9 Global Step: 122720 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:20:27,453-Speed 3285.00 samples/sec Loss 4.3898 LearningRate 0.0256 Epoch: 9 Global Step: 122730 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:20:30,581-Speed 3274.71 samples/sec Loss 4.4112 LearningRate 0.0256 Epoch: 9 Global Step: 122740 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:20:33,675-Speed 3311.23 samples/sec Loss 4.4525 LearningRate 0.0256 Epoch: 9 Global Step: 122750 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:20:36,757-Speed 3323.31 samples/sec Loss 4.4157 LearningRate 0.0256 Epoch: 9 Global Step: 122760 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:20:39,940-Speed 3218.52 samples/sec Loss 4.4764 LearningRate 0.0256 Epoch: 9 Global Step: 122770 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:20:43,167-Speed 3174.70 samples/sec Loss 4.4325 LearningRate 0.0256 Epoch: 9 Global Step: 122780 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:20:46,260-Speed 3310.85 samples/sec Loss 4.5217 LearningRate 0.0256 Epoch: 9 Global Step: 122790 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:20:49,451-Speed 3209.93 samples/sec Loss 4.3526 LearningRate 0.0256 Epoch: 9 Global Step: 122800 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:20:52,583-Speed 3270.85 samples/sec Loss 4.4323 LearningRate 0.0256 Epoch: 9 Global Step: 122810 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:20:55,664-Speed 3324.51 samples/sec Loss 4.3681 LearningRate 0.0256 Epoch: 9 Global Step: 122820 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:20:58,770-Speed 3297.62 samples/sec Loss 4.3800 LearningRate 0.0256 Epoch: 9 Global Step: 122830 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-27 12:21:01,923-Speed 3249.15 samples/sec Loss 4.4290 LearningRate 0.0256 Epoch: 9 Global Step: 122840 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:21:05,048-Speed 3277.69 samples/sec Loss 4.4800 LearningRate 0.0256 Epoch: 9 Global Step: 122850 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:21:08,161-Speed 3290.22 samples/sec Loss 4.3579 LearningRate 0.0255 Epoch: 9 Global Step: 122860 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:21:11,270-Speed 3295.06 samples/sec Loss 4.3580 LearningRate 0.0255 Epoch: 9 Global Step: 122870 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:21:14,432-Speed 3239.48 samples/sec Loss 4.3526 LearningRate 0.0255 Epoch: 9 Global Step: 122880 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:21:17,652-Speed 3181.79 samples/sec Loss 4.4679 LearningRate 0.0255 Epoch: 9 Global Step: 122890 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:21:20,746-Speed 3310.42 samples/sec Loss 4.4945 LearningRate 0.0255 Epoch: 9 Global Step: 122900 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:21:23,840-Speed 3310.07 samples/sec Loss 4.3540 LearningRate 0.0255 Epoch: 9 Global Step: 122910 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:21:26,960-Speed 3283.66 samples/sec Loss 4.3636 LearningRate 0.0255 Epoch: 9 Global Step: 122920 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:21:30,053-Speed 3311.29 samples/sec Loss 4.4189 LearningRate 0.0255 Epoch: 9 Global Step: 122930 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:21:33,160-Speed 3297.41 samples/sec Loss 4.5517 LearningRate 0.0255 Epoch: 9 Global Step: 122940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:21:36,266-Speed 3298.32 samples/sec Loss 4.4533 LearningRate 0.0255 Epoch: 9 Global Step: 122950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:21:39,399-Speed 3268.56 samples/sec Loss 4.4261 LearningRate 0.0255 Epoch: 9 Global Step: 122960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:21:42,532-Speed 3269.81 samples/sec Loss 4.3051 LearningRate 0.0255 Epoch: 9 Global Step: 122970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:21:45,629-Speed 3307.53 samples/sec Loss 4.4256 LearningRate 0.0255 Epoch: 9 Global Step: 122980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:21:48,792-Speed 3238.48 samples/sec Loss 4.3804 LearningRate 0.0255 Epoch: 9 Global Step: 122990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:21:51,973-Speed 3219.72 samples/sec Loss 4.4125 LearningRate 0.0255 Epoch: 9 Global Step: 123000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:21:55,093-Speed 3284.17 samples/sec Loss 4.3557 LearningRate 0.0255 Epoch: 9 Global Step: 123010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:21:58,203-Speed 3293.24 samples/sec Loss 4.4411 LearningRate 0.0255 Epoch: 9 Global Step: 123020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:22:01,356-Speed 3248.25 samples/sec Loss 4.4118 LearningRate 0.0255 Epoch: 9 Global Step: 123030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:22:04,434-Speed 3328.56 samples/sec Loss 4.4672 LearningRate 0.0255 Epoch: 9 Global Step: 123040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:22:07,532-Speed 3306.36 samples/sec Loss 4.5093 LearningRate 0.0255 Epoch: 9 Global Step: 123050 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:22:10,621-Speed 3315.10 samples/sec Loss 4.4235 LearningRate 0.0255 Epoch: 9 Global Step: 123060 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:22:13,764-Speed 3259.48 samples/sec Loss 4.2839 LearningRate 0.0255 Epoch: 9 Global Step: 123070 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:22:16,947-Speed 3218.14 samples/sec Loss 4.3864 LearningRate 0.0255 Epoch: 9 Global Step: 123080 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:22:20,059-Speed 3293.12 samples/sec Loss 4.4330 LearningRate 0.0255 Epoch: 9 Global Step: 123090 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:22:23,161-Speed 3302.54 samples/sec Loss 4.4710 LearningRate 0.0254 Epoch: 9 Global Step: 123100 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:22:26,956-Speed 2699.00 samples/sec Loss 4.3627 LearningRate 0.0254 Epoch: 9 Global Step: 123110 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:22:30,063-Speed 3297.51 samples/sec Loss 4.3604 LearningRate 0.0254 Epoch: 9 Global Step: 123120 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:22:33,192-Speed 3272.91 samples/sec Loss 4.5977 LearningRate 0.0254 Epoch: 9 Global Step: 123130 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:22:36,339-Speed 3255.81 samples/sec Loss 4.3616 LearningRate 0.0254 Epoch: 9 Global Step: 123140 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:22:39,479-Speed 3261.93 samples/sec Loss 4.4401 LearningRate 0.0254 Epoch: 9 Global Step: 123150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:22:42,641-Speed 3239.64 samples/sec Loss 4.3520 LearningRate 0.0254 Epoch: 9 Global Step: 123160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:22:45,764-Speed 3279.70 samples/sec Loss 4.4084 LearningRate 0.0254 Epoch: 9 Global Step: 123170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:22:48,893-Speed 3273.59 samples/sec Loss 4.2968 LearningRate 0.0254 Epoch: 9 Global Step: 123180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:22:52,007-Speed 3289.51 samples/sec Loss 4.5339 LearningRate 0.0254 Epoch: 9 Global Step: 123190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:22:55,163-Speed 3246.23 samples/sec Loss 4.4001 LearningRate 0.0254 Epoch: 9 Global Step: 123200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:22:58,238-Speed 3330.98 samples/sec Loss 4.4690 LearningRate 0.0254 Epoch: 9 Global Step: 123210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:23:01,313-Speed 3330.51 samples/sec Loss 4.4362 LearningRate 0.0254 Epoch: 9 Global Step: 123220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:23:04,430-Speed 3286.39 samples/sec Loss 4.3868 LearningRate 0.0254 Epoch: 9 Global Step: 123230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:23:07,553-Speed 3279.93 samples/sec Loss 4.5050 LearningRate 0.0254 Epoch: 9 Global Step: 123240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:23:10,691-Speed 3264.81 samples/sec Loss 4.5163 LearningRate 0.0254 Epoch: 9 Global Step: 123250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:23:13,766-Speed 3330.65 samples/sec Loss 4.4713 LearningRate 0.0254 Epoch: 9 Global Step: 123260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:23:16,862-Speed 3308.44 samples/sec Loss 4.4981 LearningRate 0.0254 Epoch: 9 Global Step: 123270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:23:19,963-Speed 3303.88 samples/sec Loss 4.3665 LearningRate 0.0254 Epoch: 9 Global Step: 123280 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:23:23,109-Speed 3255.60 samples/sec Loss 4.3899 LearningRate 0.0254 Epoch: 9 Global Step: 123290 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:23:26,222-Speed 3290.89 samples/sec Loss 4.3901 LearningRate 0.0254 Epoch: 9 Global Step: 123300 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:23:29,368-Speed 3255.55 samples/sec Loss 4.4918 LearningRate 0.0254 Epoch: 9 Global Step: 123310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:23:32,462-Speed 3310.47 samples/sec Loss 4.4295 LearningRate 0.0254 Epoch: 9 Global Step: 123320 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:23:35,572-Speed 3293.72 samples/sec Loss 4.5327 LearningRate 0.0254 Epoch: 9 Global Step: 123330 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:23:38,724-Speed 3249.35 samples/sec Loss 4.4081 LearningRate 0.0254 Epoch: 9 Global Step: 123340 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:23:41,843-Speed 3284.78 samples/sec Loss 4.4312 LearningRate 0.0253 Epoch: 9 Global Step: 123350 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:23:44,948-Speed 3298.16 samples/sec Loss 4.4598 LearningRate 0.0253 Epoch: 9 Global Step: 123360 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:23:48,022-Speed 3332.35 samples/sec Loss 4.3269 LearningRate 0.0253 Epoch: 9 Global Step: 123370 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:23:51,162-Speed 3262.43 samples/sec Loss 4.3914 LearningRate 0.0253 Epoch: 9 Global Step: 123380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:23:54,275-Speed 3290.50 samples/sec Loss 4.3846 LearningRate 0.0253 Epoch: 9 Global Step: 123390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:23:57,380-Speed 3299.16 samples/sec Loss 4.3582 LearningRate 0.0253 Epoch: 9 Global Step: 123400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:00,557-Speed 3224.27 samples/sec Loss 4.3911 LearningRate 0.0253 Epoch: 9 Global Step: 123410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:05,570-Speed 2043.17 samples/sec Loss 4.4243 LearningRate 0.0253 Epoch: 9 Global Step: 123420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:08,669-Speed 3305.68 samples/sec Loss 4.4273 LearningRate 0.0253 Epoch: 9 Global Step: 123430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:13,235-Speed 2243.42 samples/sec Loss 4.4859 LearningRate 0.0253 Epoch: 9 Global Step: 123440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:16,385-Speed 3251.56 samples/sec Loss 4.2293 LearningRate 0.0253 Epoch: 9 Global Step: 123450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:19,475-Speed 3315.01 samples/sec Loss 4.3088 LearningRate 0.0253 Epoch: 9 Global Step: 123460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:22,549-Speed 3332.23 samples/sec Loss 4.4804 LearningRate 0.0253 Epoch: 9 Global Step: 123470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:25,665-Speed 3287.74 samples/sec Loss 4.3976 LearningRate 0.0253 Epoch: 9 Global Step: 123480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:24:28,807-Speed 3259.23 samples/sec Loss 4.4298 LearningRate 0.0253 Epoch: 9 Global Step: 123490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:31,951-Speed 3258.39 samples/sec Loss 4.4461 LearningRate 0.0253 Epoch: 9 Global Step: 123500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:35,041-Speed 3314.58 samples/sec Loss 4.3867 LearningRate 0.0253 Epoch: 9 Global Step: 123510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:38,142-Speed 3303.67 samples/sec Loss 4.3955 LearningRate 0.0253 Epoch: 9 Global Step: 123520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:41,273-Speed 3271.53 samples/sec Loss 4.4870 LearningRate 0.0253 Epoch: 9 Global Step: 123530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:44,400-Speed 3275.64 samples/sec Loss 4.4470 LearningRate 0.0253 Epoch: 9 Global Step: 123540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:47,574-Speed 3227.69 samples/sec Loss 4.3697 LearningRate 0.0253 Epoch: 9 Global Step: 123550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:50,703-Speed 3273.04 samples/sec Loss 4.3028 LearningRate 0.0253 Epoch: 9 Global Step: 123560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:53,830-Speed 3276.28 samples/sec Loss 4.4534 LearningRate 0.0253 Epoch: 9 Global Step: 123570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:24:56,942-Speed 3290.94 samples/sec Loss 4.3059 LearningRate 0.0253 Epoch: 9 Global Step: 123580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:00,036-Speed 3310.34 samples/sec Loss 4.4788 LearningRate 0.0253 Epoch: 9 Global Step: 123590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:03,178-Speed 3260.04 samples/sec Loss 4.4665 LearningRate 0.0252 Epoch: 9 Global Step: 123600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:06,242-Speed 3343.30 samples/sec Loss 4.5271 LearningRate 0.0252 Epoch: 9 Global Step: 123610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:09,313-Speed 3335.86 samples/sec Loss 4.3467 LearningRate 0.0252 Epoch: 9 Global Step: 123620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:12,459-Speed 3255.81 samples/sec Loss 4.3960 LearningRate 0.0252 Epoch: 9 Global Step: 123630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:15,657-Speed 3203.31 samples/sec Loss 4.3684 LearningRate 0.0252 Epoch: 9 Global Step: 123640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:18,806-Speed 3252.41 samples/sec Loss 4.5073 LearningRate 0.0252 Epoch: 9 Global Step: 123650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:21,921-Speed 3288.86 samples/sec Loss 4.3910 LearningRate 0.0252 Epoch: 9 Global Step: 123660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:25,001-Speed 3325.86 samples/sec Loss 4.4117 LearningRate 0.0252 Epoch: 9 Global Step: 123670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:28,112-Speed 3291.61 samples/sec Loss 4.3973 LearningRate 0.0252 Epoch: 9 Global Step: 123680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:31,248-Speed 3266.35 samples/sec Loss 4.3378 LearningRate 0.0252 Epoch: 9 Global Step: 123690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:25:34,335-Speed 3318.32 samples/sec Loss 4.2961 LearningRate 0.0252 Epoch: 9 Global Step: 123700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:37,461-Speed 3276.35 samples/sec Loss 4.3859 LearningRate 0.0252 Epoch: 9 Global Step: 123710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:40,627-Speed 3236.11 samples/sec Loss 4.3694 LearningRate 0.0252 Epoch: 9 Global Step: 123720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:25:43,704-Speed 3328.94 samples/sec Loss 4.4444 LearningRate 0.0252 Epoch: 9 Global Step: 123730 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:25:46,789-Speed 3320.06 samples/sec Loss 4.4151 LearningRate 0.0252 Epoch: 9 Global Step: 123740 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:25:49,917-Speed 3274.20 samples/sec Loss 4.4461 LearningRate 0.0252 Epoch: 9 Global Step: 123750 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:25:53,071-Speed 3247.58 samples/sec Loss 4.4735 LearningRate 0.0252 Epoch: 9 Global Step: 123760 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:25:56,169-Speed 3307.36 samples/sec Loss 4.4082 LearningRate 0.0252 Epoch: 9 Global Step: 123770 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:25:59,261-Speed 3312.82 samples/sec Loss 4.3897 LearningRate 0.0252 Epoch: 9 Global Step: 123780 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:26:02,345-Speed 3320.54 samples/sec Loss 4.4471 LearningRate 0.0252 Epoch: 9 Global Step: 123790 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:26:05,458-Speed 3291.21 samples/sec Loss 4.3669 LearningRate 0.0252 Epoch: 9 Global Step: 123800 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:26:08,590-Speed 3269.90 samples/sec Loss 4.4775 LearningRate 0.0252 Epoch: 9 Global Step: 123810 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:26:11,691-Speed 3303.03 samples/sec Loss 4.3880 LearningRate 0.0252 Epoch: 9 Global Step: 123820 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:26:14,837-Speed 3256.33 samples/sec Loss 4.5150 LearningRate 0.0252 Epoch: 9 Global Step: 123830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:17,968-Speed 3271.60 samples/sec Loss 4.4699 LearningRate 0.0251 Epoch: 9 Global Step: 123840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:21,063-Speed 3309.72 samples/sec Loss 4.4354 LearningRate 0.0251 Epoch: 9 Global Step: 123850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:24,141-Speed 3328.03 samples/sec Loss 4.4123 LearningRate 0.0251 Epoch: 9 Global Step: 123860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:27,270-Speed 3274.00 samples/sec Loss 4.3710 LearningRate 0.0251 Epoch: 9 Global Step: 123870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:30,375-Speed 3298.38 samples/sec Loss 4.3811 LearningRate 0.0251 Epoch: 9 Global Step: 123880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:33,483-Speed 3295.83 samples/sec Loss 4.3800 LearningRate 0.0251 Epoch: 9 Global Step: 123890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:36,587-Speed 3300.00 samples/sec Loss 4.3779 LearningRate 0.0251 Epoch: 9 Global Step: 123900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:39,744-Speed 3244.18 samples/sec Loss 4.4674 LearningRate 0.0251 Epoch: 9 Global Step: 123910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:42,897-Speed 3249.29 samples/sec Loss 4.4142 LearningRate 0.0251 Epoch: 9 Global Step: 123920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:45,973-Speed 3330.57 samples/sec Loss 4.4583 LearningRate 0.0251 Epoch: 9 Global Step: 123930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:26:49,042-Speed 3336.98 samples/sec Loss 4.5229 LearningRate 0.0251 Epoch: 9 Global Step: 123940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:52,132-Speed 3315.25 samples/sec Loss 4.4304 LearningRate 0.0251 Epoch: 9 Global Step: 123950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:55,219-Speed 3317.80 samples/sec Loss 4.3808 LearningRate 0.0251 Epoch: 9 Global Step: 123960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:26:58,308-Speed 3315.90 samples/sec Loss 4.4098 LearningRate 0.0251 Epoch: 9 Global Step: 123970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:27:01,408-Speed 3304.25 samples/sec Loss 4.3904 LearningRate 0.0251 Epoch: 9 Global Step: 123980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:27:04,554-Speed 3255.80 samples/sec Loss 4.4163 LearningRate 0.0251 Epoch: 9 Global Step: 123990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:27:07,697-Speed 3259.31 samples/sec Loss 4.3611 LearningRate 0.0251 Epoch: 9 Global Step: 124000 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:27:10,770-Speed 3332.73 samples/sec Loss 4.4298 LearningRate 0.0251 Epoch: 9 Global Step: 124010 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:27:13,898-Speed 3275.05 samples/sec Loss 4.3987 LearningRate 0.0251 Epoch: 9 Global Step: 124020 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:27:17,069-Speed 3229.98 samples/sec Loss 4.4060 LearningRate 0.0251 Epoch: 9 Global Step: 124030 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:27:20,140-Speed 3335.76 samples/sec Loss 4.4094 LearningRate 0.0251 Epoch: 9 Global Step: 124040 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:27:23,229-Speed 3315.65 samples/sec Loss 4.4697 LearningRate 0.0251 Epoch: 9 Global Step: 124050 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:27:26,305-Speed 3330.26 samples/sec Loss 4.4291 LearningRate 0.0251 Epoch: 9 Global Step: 124060 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:27:29,376-Speed 3334.81 samples/sec Loss 4.3710 LearningRate 0.0251 Epoch: 9 Global Step: 124070 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:27:32,459-Speed 3323.13 samples/sec Loss 4.4293 LearningRate 0.0251 Epoch: 9 Global Step: 124080 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:27:35,560-Speed 3303.15 samples/sec Loss 4.4672 LearningRate 0.0250 Epoch: 9 Global Step: 124090 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:27:38,652-Speed 3312.13 samples/sec Loss 4.3366 LearningRate 0.0250 Epoch: 9 Global Step: 124100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:27:41,756-Speed 3299.91 samples/sec Loss 4.4760 LearningRate 0.0250 Epoch: 9 Global Step: 124110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:27:44,871-Speed 3289.14 samples/sec Loss 4.3086 LearningRate 0.0250 Epoch: 9 Global Step: 124120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:27:48,003-Speed 3269.57 samples/sec Loss 4.4152 LearningRate 0.0250 Epoch: 9 Global Step: 124130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:27:51,099-Speed 3309.02 samples/sec Loss 4.5197 LearningRate 0.0250 Epoch: 9 Global Step: 124140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:27:54,294-Speed 3205.26 samples/sec Loss 4.4386 LearningRate 0.0250 Epoch: 9 Global Step: 124150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:27:57,377-Speed 3322.70 samples/sec Loss 4.4194 LearningRate 0.0250 Epoch: 9 Global Step: 124160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:28:00,516-Speed 3263.43 samples/sec Loss 4.4124 LearningRate 0.0250 Epoch: 9 Global Step: 124170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:28:03,706-Speed 3211.57 samples/sec Loss 4.3451 LearningRate 0.0250 Epoch: 9 Global Step: 124180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:28:06,807-Speed 3302.97 samples/sec Loss 4.3995 LearningRate 0.0250 Epoch: 9 Global Step: 124190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:28:10,137-Speed 3076.21 samples/sec Loss 4.3925 LearningRate 0.0250 Epoch: 9 Global Step: 124200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:28:13,280-Speed 3258.62 samples/sec Loss 4.3578 LearningRate 0.0250 Epoch: 9 Global Step: 124210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:28:45,566-Speed 317.18 samples/sec Loss 3.2904 LearningRate 0.0250 Epoch: 10 Global Step: 124220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:28:49,015-Speed 2970.48 samples/sec Loss 3.1884 LearningRate 0.0250 Epoch: 10 Global Step: 124230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:28:52,083-Speed 3339.15 samples/sec Loss 3.1716 LearningRate 0.0250 Epoch: 10 Global Step: 124240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:28:55,219-Speed 3266.24 samples/sec Loss 3.3233 LearningRate 0.0250 Epoch: 10 Global Step: 124250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:28:58,323-Speed 3300.38 samples/sec Loss 3.3085 LearningRate 0.0250 Epoch: 10 Global Step: 124260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:29:01,470-Speed 3255.17 samples/sec Loss 3.1765 LearningRate 0.0250 Epoch: 10 Global Step: 124270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:29:04,705-Speed 3165.91 samples/sec Loss 3.2771 LearningRate 0.0250 Epoch: 10 Global Step: 124280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:29:07,774-Speed 3338.16 samples/sec Loss 3.2770 LearningRate 0.0250 Epoch: 10 Global Step: 124290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:29:10,874-Speed 3303.99 samples/sec Loss 3.2021 LearningRate 0.0250 Epoch: 10 Global Step: 124300 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:29:14,006-Speed 3270.74 samples/sec Loss 3.2753 LearningRate 0.0250 Epoch: 10 Global Step: 124310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:29:17,186-Speed 3221.20 samples/sec Loss 3.3094 LearningRate 0.0250 Epoch: 10 Global Step: 124320 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:29:20,383-Speed 3203.19 samples/sec Loss 3.3115 LearningRate 0.0250 Epoch: 10 Global Step: 124330 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:29:23,614-Speed 3170.11 samples/sec Loss 3.2967 LearningRate 0.0249 Epoch: 10 Global Step: 124340 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:29:27,145-Speed 2900.85 samples/sec Loss 3.2978 LearningRate 0.0249 Epoch: 10 Global Step: 124350 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:29:30,250-Speed 3299.43 samples/sec Loss 3.2396 LearningRate 0.0249 Epoch: 10 Global Step: 124360 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:29:33,338-Speed 3317.03 samples/sec Loss 3.2210 LearningRate 0.0249 Epoch: 10 Global Step: 124370 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:29:36,413-Speed 3331.32 samples/sec Loss 3.2125 LearningRate 0.0249 Epoch: 10 Global Step: 124380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:29:39,503-Speed 3314.59 samples/sec Loss 3.2909 LearningRate 0.0249 Epoch: 10 Global Step: 124390 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:29:42,605-Speed 3302.08 samples/sec Loss 3.2439 LearningRate 0.0249 Epoch: 10 Global Step: 124400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:29:45,661-Speed 3352.71 samples/sec Loss 3.2805 LearningRate 0.0249 Epoch: 10 Global Step: 124410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:29:48,787-Speed 3276.83 samples/sec Loss 3.2351 LearningRate 0.0249 Epoch: 10 Global Step: 124420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:29:51,902-Speed 3288.07 samples/sec Loss 3.3234 LearningRate 0.0249 Epoch: 10 Global Step: 124430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:29:54,999-Speed 3307.66 samples/sec Loss 3.2574 LearningRate 0.0249 Epoch: 10 Global Step: 124440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:29:58,084-Speed 3319.72 samples/sec Loss 3.2988 LearningRate 0.0249 Epoch: 10 Global Step: 124450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:30:01,286-Speed 3199.80 samples/sec Loss 3.3043 LearningRate 0.0249 Epoch: 10 Global Step: 124460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:30:04,426-Speed 3262.12 samples/sec Loss 3.2900 LearningRate 0.0249 Epoch: 10 Global Step: 124470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:30:07,549-Speed 3280.04 samples/sec Loss 3.2190 LearningRate 0.0249 Epoch: 10 Global Step: 124480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:30:10,637-Speed 3316.69 samples/sec Loss 3.2957 LearningRate 0.0249 Epoch: 10 Global Step: 124490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:30:13,779-Speed 3260.32 samples/sec Loss 3.3040 LearningRate 0.0249 Epoch: 10 Global Step: 124500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:30:16,851-Speed 3334.41 samples/sec Loss 3.3193 LearningRate 0.0249 Epoch: 10 Global Step: 124510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:30:19,948-Speed 3306.88 samples/sec Loss 3.2948 LearningRate 0.0249 Epoch: 10 Global Step: 124520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:30:23,073-Speed 3278.09 samples/sec Loss 3.2661 LearningRate 0.0249 Epoch: 10 Global Step: 124530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:30:26,183-Speed 3293.96 samples/sec Loss 3.2742 LearningRate 0.0249 Epoch: 10 Global Step: 124540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:30:29,321-Speed 3263.58 samples/sec Loss 3.3546 LearningRate 0.0249 Epoch: 10 Global Step: 124550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:30:32,425-Speed 3300.51 samples/sec Loss 3.3065 LearningRate 0.0249 Epoch: 10 Global Step: 124560 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:30:35,556-Speed 3271.72 samples/sec Loss 3.3085 LearningRate 0.0249 Epoch: 10 Global Step: 124570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:30:38,682-Speed 3276.87 samples/sec Loss 3.3426 LearningRate 0.0249 Epoch: 10 Global Step: 124580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:30:41,773-Speed 3314.21 samples/sec Loss 3.3420 LearningRate 0.0248 Epoch: 10 Global Step: 124590 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:30:44,899-Speed 3276.44 samples/sec Loss 3.3046 LearningRate 0.0248 Epoch: 10 Global Step: 124600 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:30:48,143-Speed 3157.96 samples/sec Loss 3.1862 LearningRate 0.0248 Epoch: 10 Global Step: 124610 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:30:51,247-Speed 3300.16 samples/sec Loss 3.2005 LearningRate 0.0248 Epoch: 10 Global Step: 124620 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:30:54,323-Speed 3330.05 samples/sec Loss 3.2237 LearningRate 0.0248 Epoch: 10 Global Step: 124630 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:30:57,439-Speed 3287.39 samples/sec Loss 3.3320 LearningRate 0.0248 Epoch: 10 Global Step: 124640 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:31:00,606-Speed 3233.77 samples/sec Loss 3.2814 LearningRate 0.0248 Epoch: 10 Global Step: 124650 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:31:03,683-Speed 3328.96 samples/sec Loss 3.3703 LearningRate 0.0248 Epoch: 10 Global Step: 124660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:06,790-Speed 3296.91 samples/sec Loss 3.3123 LearningRate 0.0248 Epoch: 10 Global Step: 124670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:09,900-Speed 3294.29 samples/sec Loss 3.3358 LearningRate 0.0248 Epoch: 10 Global Step: 124680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:13,099-Speed 3201.38 samples/sec Loss 3.2656 LearningRate 0.0248 Epoch: 10 Global Step: 124690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:16,254-Speed 3246.66 samples/sec Loss 3.4016 LearningRate 0.0248 Epoch: 10 Global Step: 124700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:19,313-Speed 3348.33 samples/sec Loss 3.3347 LearningRate 0.0248 Epoch: 10 Global Step: 124710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:22,371-Speed 3350.31 samples/sec Loss 3.2963 LearningRate 0.0248 Epoch: 10 Global Step: 124720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:25,493-Speed 3280.66 samples/sec Loss 3.4010 LearningRate 0.0248 Epoch: 10 Global Step: 124730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:28,615-Speed 3280.80 samples/sec Loss 3.2851 LearningRate 0.0248 Epoch: 10 Global Step: 124740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:31,724-Speed 3295.33 samples/sec Loss 3.3592 LearningRate 0.0248 Epoch: 10 Global Step: 124750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:34,853-Speed 3273.69 samples/sec Loss 3.3796 LearningRate 0.0248 Epoch: 10 Global Step: 124760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:31:37,959-Speed 3298.12 samples/sec Loss 3.3690 LearningRate 0.0248 Epoch: 10 Global Step: 124770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:41,047-Speed 3316.45 samples/sec Loss 3.3842 LearningRate 0.0248 Epoch: 10 Global Step: 124780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:44,115-Speed 3338.50 samples/sec Loss 3.3471 LearningRate 0.0248 Epoch: 10 Global Step: 124790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:47,234-Speed 3284.67 samples/sec Loss 3.3034 LearningRate 0.0248 Epoch: 10 Global Step: 124800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:50,394-Speed 3241.82 samples/sec Loss 3.3522 LearningRate 0.0248 Epoch: 10 Global Step: 124810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:53,618-Speed 3176.11 samples/sec Loss 3.3100 LearningRate 0.0248 Epoch: 10 Global Step: 124820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:56,700-Speed 3324.23 samples/sec Loss 3.3537 LearningRate 0.0248 Epoch: 10 Global Step: 124830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:31:59,861-Speed 3240.09 samples/sec Loss 3.3482 LearningRate 0.0247 Epoch: 10 Global Step: 124840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:03,035-Speed 3228.04 samples/sec Loss 3.3538 LearningRate 0.0247 Epoch: 10 Global Step: 124850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:06,228-Speed 3207.65 samples/sec Loss 3.3770 LearningRate 0.0247 Epoch: 10 Global Step: 124860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:09,324-Speed 3308.54 samples/sec Loss 3.2503 LearningRate 0.0247 Epoch: 10 Global Step: 124870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:12,423-Speed 3306.07 samples/sec Loss 3.3509 LearningRate 0.0247 Epoch: 10 Global Step: 124880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:15,533-Speed 3293.85 samples/sec Loss 3.3466 LearningRate 0.0247 Epoch: 10 Global Step: 124890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:18,718-Speed 3215.48 samples/sec Loss 3.3963 LearningRate 0.0247 Epoch: 10 Global Step: 124900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:21,805-Speed 3318.60 samples/sec Loss 3.3460 LearningRate 0.0247 Epoch: 10 Global Step: 124910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:24,904-Speed 3304.98 samples/sec Loss 3.3706 LearningRate 0.0247 Epoch: 10 Global Step: 124920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:28,038-Speed 3268.87 samples/sec Loss 3.3562 LearningRate 0.0247 Epoch: 10 Global Step: 124930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:31,119-Speed 3324.28 samples/sec Loss 3.4049 LearningRate 0.0247 Epoch: 10 Global Step: 124940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:34,214-Speed 3309.98 samples/sec Loss 3.3018 LearningRate 0.0247 Epoch: 10 Global Step: 124950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:37,303-Speed 3315.71 samples/sec Loss 3.3513 LearningRate 0.0247 Epoch: 10 Global Step: 124960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:32:40,399-Speed 3308.21 samples/sec Loss 3.3959 LearningRate 0.0247 Epoch: 10 Global Step: 124970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:32:43,519-Speed 3283.24 samples/sec Loss 3.4005 LearningRate 0.0247 Epoch: 10 Global Step: 124980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:32:46,629-Speed 3294.01 samples/sec Loss 3.3635 LearningRate 0.0247 Epoch: 10 Global Step: 124990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:32:49,704-Speed 3330.63 samples/sec Loss 3.4289 LearningRate 0.0247 Epoch: 10 Global Step: 125000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:32:52,843-Speed 3263.66 samples/sec Loss 3.4386 LearningRate 0.0247 Epoch: 10 Global Step: 125010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:32:55,941-Speed 3305.63 samples/sec Loss 3.3746 LearningRate 0.0247 Epoch: 10 Global Step: 125020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:32:59,021-Speed 3325.97 samples/sec Loss 3.3545 LearningRate 0.0247 Epoch: 10 Global Step: 125030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:02,122-Speed 3303.96 samples/sec Loss 3.4241 LearningRate 0.0247 Epoch: 10 Global Step: 125040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:05,239-Speed 3286.17 samples/sec Loss 3.2875 LearningRate 0.0247 Epoch: 10 Global Step: 125050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:08,329-Speed 3314.61 samples/sec Loss 3.3971 LearningRate 0.0247 Epoch: 10 Global Step: 125060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:11,441-Speed 3291.86 samples/sec Loss 3.3924 LearningRate 0.0247 Epoch: 10 Global Step: 125070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:14,547-Speed 3297.63 samples/sec Loss 3.2876 LearningRate 0.0247 Epoch: 10 Global Step: 125080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:17,660-Speed 3290.53 samples/sec Loss 3.3561 LearningRate 0.0246 Epoch: 10 Global Step: 125090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:20,779-Speed 3283.98 samples/sec Loss 3.3917 LearningRate 0.0246 Epoch: 10 Global Step: 125100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:23,874-Speed 3310.09 samples/sec Loss 3.4769 LearningRate 0.0246 Epoch: 10 Global Step: 125110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:26,983-Speed 3294.85 samples/sec Loss 3.4027 LearningRate 0.0246 Epoch: 10 Global Step: 125120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:30,158-Speed 3226.09 samples/sec Loss 3.4287 LearningRate 0.0246 Epoch: 10 Global Step: 125130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:33:33,211-Speed 3354.84 samples/sec Loss 3.4128 LearningRate 0.0246 Epoch: 10 Global Step: 125140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:33:36,304-Speed 3311.58 samples/sec Loss 3.3821 LearningRate 0.0246 Epoch: 10 Global Step: 125150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:33:39,357-Speed 3355.94 samples/sec Loss 3.4261 LearningRate 0.0246 Epoch: 10 Global Step: 125160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:42,466-Speed 3294.27 samples/sec Loss 3.3901 LearningRate 0.0246 Epoch: 10 Global Step: 125170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:45,571-Speed 3299.33 samples/sec Loss 3.4565 LearningRate 0.0246 Epoch: 10 Global Step: 125180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:48,687-Speed 3286.23 samples/sec Loss 3.4289 LearningRate 0.0246 Epoch: 10 Global Step: 125190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:51,810-Speed 3279.96 samples/sec Loss 3.4317 LearningRate 0.0246 Epoch: 10 Global Step: 125200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:54,952-Speed 3260.90 samples/sec Loss 3.4279 LearningRate 0.0246 Epoch: 10 Global Step: 125210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:33:58,025-Speed 3332.73 samples/sec Loss 3.3495 LearningRate 0.0246 Epoch: 10 Global Step: 125220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:34:01,103-Speed 3328.09 samples/sec Loss 3.4294 LearningRate 0.0246 Epoch: 10 Global Step: 125230 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:34:04,246-Speed 3258.34 samples/sec Loss 3.3659 LearningRate 0.0246 Epoch: 10 Global Step: 125240 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:34:07,438-Speed 3210.13 samples/sec Loss 3.4740 LearningRate 0.0246 Epoch: 10 Global Step: 125250 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:34:10,526-Speed 3316.29 samples/sec Loss 3.4115 LearningRate 0.0246 Epoch: 10 Global Step: 125260 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:34:13,614-Speed 3317.35 samples/sec Loss 3.4752 LearningRate 0.0246 Epoch: 10 Global Step: 125270 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:34:16,744-Speed 3272.37 samples/sec Loss 3.3866 LearningRate 0.0246 Epoch: 10 Global Step: 125280 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:34:19,846-Speed 3302.37 samples/sec Loss 3.5338 LearningRate 0.0246 Epoch: 10 Global Step: 125290 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:34:22,952-Speed 3298.36 samples/sec Loss 3.4608 LearningRate 0.0246 Epoch: 10 Global Step: 125300 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:34:26,039-Speed 3317.82 samples/sec Loss 3.4149 LearningRate 0.0246 Epoch: 10 Global Step: 125310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:34:29,106-Speed 3340.09 samples/sec Loss 3.4521 LearningRate 0.0246 Epoch: 10 Global Step: 125320 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:34:32,191-Speed 3319.96 samples/sec Loss 3.3503 LearningRate 0.0246 Epoch: 10 Global Step: 125330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:34:35,293-Speed 3302.18 samples/sec Loss 3.3891 LearningRate 0.0245 Epoch: 10 Global Step: 125340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:34:38,350-Speed 3350.85 samples/sec Loss 3.4712 LearningRate 0.0245 Epoch: 10 Global Step: 125350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:34:41,522-Speed 3229.48 samples/sec Loss 3.3901 LearningRate 0.0245 Epoch: 10 Global Step: 125360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:34:44,594-Speed 3334.49 samples/sec Loss 3.3583 LearningRate 0.0245 Epoch: 10 Global Step: 125370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:34:47,655-Speed 3346.05 samples/sec Loss 3.5154 LearningRate 0.0245 Epoch: 10 Global Step: 125380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:34:50,788-Speed 3269.35 samples/sec Loss 3.4783 LearningRate 0.0245 Epoch: 10 Global Step: 125390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:34:53,855-Speed 3339.84 samples/sec Loss 3.4526 LearningRate 0.0245 Epoch: 10 Global Step: 125400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:34:56,981-Speed 3277.13 samples/sec Loss 3.4328 LearningRate 0.0245 Epoch: 10 Global Step: 125410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:35:00,081-Speed 3304.06 samples/sec Loss 3.4371 LearningRate 0.0245 Epoch: 10 Global Step: 125420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:35:03,270-Speed 3212.31 samples/sec Loss 3.4349 LearningRate 0.0245 Epoch: 10 Global Step: 125430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:35:06,397-Speed 3275.69 samples/sec Loss 3.4910 LearningRate 0.0245 Epoch: 10 Global Step: 125440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:35:09,450-Speed 3354.89 samples/sec Loss 3.5255 LearningRate 0.0245 Epoch: 10 Global Step: 125450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:35:12,542-Speed 3313.03 samples/sec Loss 3.4889 LearningRate 0.0245 Epoch: 10 Global Step: 125460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:35:15,631-Speed 3316.42 samples/sec Loss 3.4737 LearningRate 0.0245 Epoch: 10 Global Step: 125470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:35:18,798-Speed 3234.60 samples/sec Loss 3.4599 LearningRate 0.0245 Epoch: 10 Global Step: 125480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:35:21,848-Speed 3357.69 samples/sec Loss 3.4882 LearningRate 0.0245 Epoch: 10 Global Step: 125490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:35:24,939-Speed 3314.73 samples/sec Loss 3.5419 LearningRate 0.0245 Epoch: 10 Global Step: 125500 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:35:28,014-Speed 3330.70 samples/sec Loss 3.4653 LearningRate 0.0245 Epoch: 10 Global Step: 125510 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:35:31,126-Speed 3291.47 samples/sec Loss 3.4969 LearningRate 0.0245 Epoch: 10 Global Step: 125520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:35:34,195-Speed 3337.65 samples/sec Loss 3.5006 LearningRate 0.0245 Epoch: 10 Global Step: 125530 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:35:37,290-Speed 3309.65 samples/sec Loss 3.5094 LearningRate 0.0245 Epoch: 10 Global Step: 125540 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:35:40,356-Speed 3340.89 samples/sec Loss 3.4451 LearningRate 0.0245 Epoch: 10 Global Step: 125550 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:35:43,465-Speed 3295.10 samples/sec Loss 3.4547 LearningRate 0.0245 Epoch: 10 Global Step: 125560 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:35:46,578-Speed 3289.76 samples/sec Loss 3.4016 LearningRate 0.0245 Epoch: 10 Global Step: 125570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:35:49,656-Speed 3328.02 samples/sec Loss 3.4447 LearningRate 0.0245 Epoch: 10 Global Step: 125580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:35:52,735-Speed 3327.41 samples/sec Loss 3.4657 LearningRate 0.0244 Epoch: 10 Global Step: 125590 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:35:55,813-Speed 3327.67 samples/sec Loss 3.4816 LearningRate 0.0244 Epoch: 10 Global Step: 125600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:35:58,925-Speed 3291.12 samples/sec Loss 3.5428 LearningRate 0.0244 Epoch: 10 Global Step: 125610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:36:02,070-Speed 3257.72 samples/sec Loss 3.4517 LearningRate 0.0244 Epoch: 10 Global Step: 125620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:36:05,270-Speed 3200.87 samples/sec Loss 3.4850 LearningRate 0.0244 Epoch: 10 Global Step: 125630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:36:08,326-Speed 3351.93 samples/sec Loss 3.4776 LearningRate 0.0244 Epoch: 10 Global Step: 125640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:36:11,391-Speed 3342.18 samples/sec Loss 3.5324 LearningRate 0.0244 Epoch: 10 Global Step: 125650 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:14,503-Speed 3291.50 samples/sec Loss 3.5258 LearningRate 0.0244 Epoch: 10 Global Step: 125660 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:17,563-Speed 3347.57 samples/sec Loss 3.3979 LearningRate 0.0244 Epoch: 10 Global Step: 125670 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:20,647-Speed 3321.61 samples/sec Loss 3.5063 LearningRate 0.0244 Epoch: 10 Global Step: 125680 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:23,816-Speed 3232.37 samples/sec Loss 3.5165 LearningRate 0.0244 Epoch: 10 Global Step: 125690 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:26,920-Speed 3299.25 samples/sec Loss 3.4796 LearningRate 0.0244 Epoch: 10 Global Step: 125700 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:30,048-Speed 3274.49 samples/sec Loss 3.4772 LearningRate 0.0244 Epoch: 10 Global Step: 125710 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:33,120-Speed 3335.50 samples/sec Loss 3.5675 LearningRate 0.0244 Epoch: 10 Global Step: 125720 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:36,200-Speed 3325.15 samples/sec Loss 3.5121 LearningRate 0.0244 Epoch: 10 Global Step: 125730 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:39,345-Speed 3256.97 samples/sec Loss 3.4790 LearningRate 0.0244 Epoch: 10 Global Step: 125740 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:42,552-Speed 3194.57 samples/sec Loss 3.5215 LearningRate 0.0244 Epoch: 10 Global Step: 125750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:36:45,634-Speed 3322.67 samples/sec Loss 3.5787 LearningRate 0.0244 Epoch: 10 Global Step: 125760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:36:48,715-Speed 3324.67 samples/sec Loss 3.4964 LearningRate 0.0244 Epoch: 10 Global Step: 125770 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:51,913-Speed 3202.90 samples/sec Loss 3.5899 LearningRate 0.0244 Epoch: 10 Global Step: 125780 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:54,987-Speed 3332.62 samples/sec Loss 3.4916 LearningRate 0.0244 Epoch: 10 Global Step: 125790 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:36:58,072-Speed 3319.91 samples/sec Loss 3.4835 LearningRate 0.0244 Epoch: 10 Global Step: 125800 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:01,161-Speed 3316.53 samples/sec Loss 3.5699 LearningRate 0.0244 Epoch: 10 Global Step: 125810 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:04,245-Speed 3320.82 samples/sec Loss 3.5924 LearningRate 0.0244 Epoch: 10 Global Step: 125820 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:07,310-Speed 3342.36 samples/sec Loss 3.5025 LearningRate 0.0244 Epoch: 10 Global Step: 125830 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:10,389-Speed 3326.67 samples/sec Loss 3.5701 LearningRate 0.0243 Epoch: 10 Global Step: 125840 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:13,556-Speed 3234.59 samples/sec Loss 3.5629 LearningRate 0.0243 Epoch: 10 Global Step: 125850 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:16,667-Speed 3293.29 samples/sec Loss 3.4900 LearningRate 0.0243 Epoch: 10 Global Step: 125860 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:19,774-Speed 3296.07 samples/sec Loss 3.5316 LearningRate 0.0243 Epoch: 10 Global Step: 125870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:37:22,874-Speed 3304.29 samples/sec Loss 3.5436 LearningRate 0.0243 Epoch: 10 Global Step: 125880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:37:26,013-Speed 3263.67 samples/sec Loss 3.5274 LearningRate 0.0243 Epoch: 10 Global Step: 125890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:37:29,158-Speed 3257.04 samples/sec Loss 3.5941 LearningRate 0.0243 Epoch: 10 Global Step: 125900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:37:32,272-Speed 3288.68 samples/sec Loss 3.5713 LearningRate 0.0243 Epoch: 10 Global Step: 125910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:37:35,343-Speed 3335.98 samples/sec Loss 3.5106 LearningRate 0.0243 Epoch: 10 Global Step: 125920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:37:38,450-Speed 3296.17 samples/sec Loss 3.5775 LearningRate 0.0243 Epoch: 10 Global Step: 125930 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:41,599-Speed 3253.26 samples/sec Loss 3.4912 LearningRate 0.0243 Epoch: 10 Global Step: 125940 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:44,687-Speed 3317.44 samples/sec Loss 3.5759 LearningRate 0.0243 Epoch: 10 Global Step: 125950 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:47,798-Speed 3292.99 samples/sec Loss 3.5012 LearningRate 0.0243 Epoch: 10 Global Step: 125960 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:50,902-Speed 3298.98 samples/sec Loss 3.4995 LearningRate 0.0243 Epoch: 10 Global Step: 125970 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:54,012-Speed 3294.14 samples/sec Loss 3.4608 LearningRate 0.0243 Epoch: 10 Global Step: 125980 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:37:57,078-Speed 3341.51 samples/sec Loss 3.5542 LearningRate 0.0243 Epoch: 10 Global Step: 125990 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:38:00,226-Speed 3253.15 samples/sec Loss 3.5402 LearningRate 0.0243 Epoch: 10 Global Step: 126000 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:38:03,396-Speed 3231.59 samples/sec Loss 3.5411 LearningRate 0.0243 Epoch: 10 Global Step: 126010 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:38:06,530-Speed 3268.02 samples/sec Loss 3.5546 LearningRate 0.0243 Epoch: 10 Global Step: 126020 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:38:09,623-Speed 3312.18 samples/sec Loss 3.5184 LearningRate 0.0243 Epoch: 10 Global Step: 126030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:12,803-Speed 3221.49 samples/sec Loss 3.5461 LearningRate 0.0243 Epoch: 10 Global Step: 126040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:15,982-Speed 3221.65 samples/sec Loss 3.5399 LearningRate 0.0243 Epoch: 10 Global Step: 126050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:19,088-Speed 3298.22 samples/sec Loss 3.5708 LearningRate 0.0243 Epoch: 10 Global Step: 126060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:22,220-Speed 3270.14 samples/sec Loss 3.4952 LearningRate 0.0243 Epoch: 10 Global Step: 126070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:25,294-Speed 3332.48 samples/sec Loss 3.5331 LearningRate 0.0243 Epoch: 10 Global Step: 126080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:28,456-Speed 3239.68 samples/sec Loss 3.5913 LearningRate 0.0242 Epoch: 10 Global Step: 126090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:31,587-Speed 3272.18 samples/sec Loss 3.5764 LearningRate 0.0242 Epoch: 10 Global Step: 126100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:34,691-Speed 3299.61 samples/sec Loss 3.5418 LearningRate 0.0242 Epoch: 10 Global Step: 126110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:37,765-Speed 3332.57 samples/sec Loss 3.5318 LearningRate 0.0242 Epoch: 10 Global Step: 126120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:40,821-Speed 3351.88 samples/sec Loss 3.5630 LearningRate 0.0242 Epoch: 10 Global Step: 126130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:38:43,958-Speed 3264.99 samples/sec Loss 3.4977 LearningRate 0.0242 Epoch: 10 Global Step: 126140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:38:47,037-Speed 3327.43 samples/sec Loss 3.5783 LearningRate 0.0242 Epoch: 10 Global Step: 126150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 12:38:50,075-Speed 3370.86 samples/sec Loss 3.5099 LearningRate 0.0242 Epoch: 10 Global Step: 126160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:53,183-Speed 3296.28 samples/sec Loss 3.4800 LearningRate 0.0242 Epoch: 10 Global Step: 126170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:56,251-Speed 3338.77 samples/sec Loss 3.5775 LearningRate 0.0242 Epoch: 10 Global Step: 126180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:38:59,285-Speed 3376.38 samples/sec Loss 3.5676 LearningRate 0.0242 Epoch: 10 Global Step: 126190 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:02,379-Speed 3310.51 samples/sec Loss 3.6424 LearningRate 0.0242 Epoch: 10 Global Step: 126200 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:05,451-Speed 3333.97 samples/sec Loss 3.5671 LearningRate 0.0242 Epoch: 10 Global Step: 126210 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:08,528-Speed 3328.72 samples/sec Loss 3.4800 LearningRate 0.0242 Epoch: 10 Global Step: 126220 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:11,693-Speed 3237.22 samples/sec Loss 3.5593 LearningRate 0.0242 Epoch: 10 Global Step: 126230 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:14,790-Speed 3307.78 samples/sec Loss 3.6146 LearningRate 0.0242 Epoch: 10 Global Step: 126240 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:17,848-Speed 3349.97 samples/sec Loss 3.5667 LearningRate 0.0242 Epoch: 10 Global Step: 126250 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:20,907-Speed 3348.41 samples/sec Loss 3.6126 LearningRate 0.0242 Epoch: 10 Global Step: 126260 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:24,049-Speed 3259.81 samples/sec Loss 3.4893 LearningRate 0.0242 Epoch: 10 Global Step: 126270 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:27,142-Speed 3311.98 samples/sec Loss 3.5622 LearningRate 0.0242 Epoch: 10 Global Step: 126280 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:30,188-Speed 3362.55 samples/sec Loss 3.6750 LearningRate 0.0242 Epoch: 10 Global Step: 126290 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:33,278-Speed 3315.19 samples/sec Loss 3.5352 LearningRate 0.0242 Epoch: 10 Global Step: 126300 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:36,356-Speed 3327.99 samples/sec Loss 3.5226 LearningRate 0.0242 Epoch: 10 Global Step: 126310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:39,550-Speed 3206.78 samples/sec Loss 3.5363 LearningRate 0.0242 Epoch: 10 Global Step: 126320 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:42,629-Speed 3326.43 samples/sec Loss 3.6804 LearningRate 0.0242 Epoch: 10 Global Step: 126330 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:45,700-Speed 3336.10 samples/sec Loss 3.5386 LearningRate 0.0241 Epoch: 10 Global Step: 126340 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:48,776-Speed 3330.12 samples/sec Loss 3.5372 LearningRate 0.0241 Epoch: 10 Global Step: 126350 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:51,938-Speed 3239.23 samples/sec Loss 3.6800 LearningRate 0.0241 Epoch: 10 Global Step: 126360 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:55,095-Speed 3245.29 samples/sec Loss 3.5709 LearningRate 0.0241 Epoch: 10 Global Step: 126370 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:39:58,160-Speed 3341.08 samples/sec Loss 3.5316 LearningRate 0.0241 Epoch: 10 Global Step: 126380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:40:01,258-Speed 3306.99 samples/sec Loss 3.6381 LearningRate 0.0241 Epoch: 10 Global Step: 126390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:40:04,337-Speed 3327.41 samples/sec Loss 3.6057 LearningRate 0.0241 Epoch: 10 Global Step: 126400 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:40:07,479-Speed 3259.34 samples/sec Loss 3.5730 LearningRate 0.0241 Epoch: 10 Global Step: 126410 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:40:10,556-Speed 3329.33 samples/sec Loss 3.5422 LearningRate 0.0241 Epoch: 10 Global Step: 126420 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:40:13,693-Speed 3264.73 samples/sec Loss 3.5414 LearningRate 0.0241 Epoch: 10 Global Step: 126430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:40:16,782-Speed 3316.60 samples/sec Loss 3.6800 LearningRate 0.0241 Epoch: 10 Global Step: 126440 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:40:19,931-Speed 3252.74 samples/sec Loss 3.5674 LearningRate 0.0241 Epoch: 10 Global Step: 126450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:40:23,071-Speed 3262.57 samples/sec Loss 3.5544 LearningRate 0.0241 Epoch: 10 Global Step: 126460 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:40:26,140-Speed 3337.10 samples/sec Loss 3.5690 LearningRate 0.0241 Epoch: 10 Global Step: 126470 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:40:29,272-Speed 3270.69 samples/sec Loss 3.6430 LearningRate 0.0241 Epoch: 10 Global Step: 126480 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:40:32,372-Speed 3304.78 samples/sec Loss 3.6589 LearningRate 0.0241 Epoch: 10 Global Step: 126490 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 12:40:35,471-Speed 3304.99 samples/sec Loss 3.5079 LearningRate 0.0241 Epoch: 10 Global Step: 126500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 12:40:38,552-Speed 3323.78 samples/sec Loss 3.6046 LearningRate 0.0241 Epoch: 10 Global Step: 126510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:40:41,631-Speed 3328.10 samples/sec Loss 3.6558 LearningRate 0.0241 Epoch: 10 Global Step: 126520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:40:44,730-Speed 3305.06 samples/sec Loss 3.6464 LearningRate 0.0241 Epoch: 10 Global Step: 126530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:40:47,842-Speed 3291.49 samples/sec Loss 3.6726 LearningRate 0.0241 Epoch: 10 Global Step: 126540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:40:50,916-Speed 3332.14 samples/sec Loss 3.5770 LearningRate 0.0241 Epoch: 10 Global Step: 126550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:40:53,973-Speed 3350.79 samples/sec Loss 3.6030 LearningRate 0.0241 Epoch: 10 Global Step: 126560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:40:57,067-Speed 3310.43 samples/sec Loss 3.6012 LearningRate 0.0241 Epoch: 10 Global Step: 126570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:41:00,138-Speed 3336.14 samples/sec Loss 3.6427 LearningRate 0.0241 Epoch: 10 Global Step: 126580 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:41:03,253-Speed 3288.26 samples/sec Loss 3.6410 LearningRate 0.0241 Epoch: 10 Global Step: 126590 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:41:06,356-Speed 3300.80 samples/sec Loss 3.6190 LearningRate 0.0240 Epoch: 10 Global Step: 126600 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:41:09,429-Speed 3333.58 samples/sec Loss 3.6052 LearningRate 0.0240 Epoch: 10 Global Step: 126610 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:41:12,520-Speed 3313.76 samples/sec Loss 3.5866 LearningRate 0.0240 Epoch: 10 Global Step: 126620 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:41:15,620-Speed 3304.70 samples/sec Loss 3.6389 LearningRate 0.0240 Epoch: 10 Global Step: 126630 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:41:18,686-Speed 3340.11 samples/sec Loss 3.5105 LearningRate 0.0240 Epoch: 10 Global Step: 126640 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:41:21,743-Speed 3351.29 samples/sec Loss 3.6205 LearningRate 0.0240 Epoch: 10 Global Step: 126650 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:41:24,839-Speed 3308.14 samples/sec Loss 3.6416 LearningRate 0.0240 Epoch: 10 Global Step: 126660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:41:27,959-Speed 3283.64 samples/sec Loss 3.6162 LearningRate 0.0240 Epoch: 10 Global Step: 126670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:41:31,055-Speed 3308.56 samples/sec Loss 3.6054 LearningRate 0.0240 Epoch: 10 Global Step: 126680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:41:34,147-Speed 3312.35 samples/sec Loss 3.6965 LearningRate 0.0240 Epoch: 10 Global Step: 126690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:41:37,287-Speed 3261.77 samples/sec Loss 3.6113 LearningRate 0.0240 Epoch: 10 Global Step: 126700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:41:40,341-Speed 3354.64 samples/sec Loss 3.6528 LearningRate 0.0240 Epoch: 10 Global Step: 126710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:41:43,428-Speed 3318.17 samples/sec Loss 3.5532 LearningRate 0.0240 Epoch: 10 Global Step: 126720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:41:46,505-Speed 3328.67 samples/sec Loss 3.6191 LearningRate 0.0240 Epoch: 10 Global Step: 126730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:41:49,640-Speed 3267.20 samples/sec Loss 3.6336 LearningRate 0.0240 Epoch: 10 Global Step: 126740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:41:52,738-Speed 3307.23 samples/sec Loss 3.5923 LearningRate 0.0240 Epoch: 10 Global Step: 126750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:41:55,905-Speed 3233.86 samples/sec Loss 3.6170 LearningRate 0.0240 Epoch: 10 Global Step: 126760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:41:59,001-Speed 3308.53 samples/sec Loss 3.6817 LearningRate 0.0240 Epoch: 10 Global Step: 126770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:42:02,086-Speed 3320.27 samples/sec Loss 3.6767 LearningRate 0.0240 Epoch: 10 Global Step: 126780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:42:05,214-Speed 3275.17 samples/sec Loss 3.5567 LearningRate 0.0240 Epoch: 10 Global Step: 126790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:42:08,262-Speed 3360.93 samples/sec Loss 3.6441 LearningRate 0.0240 Epoch: 10 Global Step: 126800 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:42:11,345-Speed 3322.32 samples/sec Loss 3.7325 LearningRate 0.0240 Epoch: 10 Global Step: 126810 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:42:14,421-Speed 3329.76 samples/sec Loss 3.5792 LearningRate 0.0240 Epoch: 10 Global Step: 126820 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:42:17,481-Speed 3347.83 samples/sec Loss 3.6471 LearningRate 0.0240 Epoch: 10 Global Step: 126830 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:42:20,545-Speed 3343.32 samples/sec Loss 3.6224 LearningRate 0.0240 Epoch: 10 Global Step: 126840 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:42:23,619-Speed 3332.23 samples/sec Loss 3.5543 LearningRate 0.0239 Epoch: 10 Global Step: 126850 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:42:26,695-Speed 3330.27 samples/sec Loss 3.6167 LearningRate 0.0239 Epoch: 10 Global Step: 126860 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:42:29,780-Speed 3320.73 samples/sec Loss 3.6604 LearningRate 0.0239 Epoch: 10 Global Step: 126870 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:42:32,872-Speed 3312.53 samples/sec Loss 3.6359 LearningRate 0.0239 Epoch: 10 Global Step: 126880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:42:35,966-Speed 3311.03 samples/sec Loss 3.6416 LearningRate 0.0239 Epoch: 10 Global Step: 126890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:42:39,093-Speed 3275.64 samples/sec Loss 3.5871 LearningRate 0.0239 Epoch: 10 Global Step: 126900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:42:42,181-Speed 3316.08 samples/sec Loss 3.6097 LearningRate 0.0239 Epoch: 10 Global Step: 126910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:42:45,289-Speed 3296.25 samples/sec Loss 3.6908 LearningRate 0.0239 Epoch: 10 Global Step: 126920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:42:48,464-Speed 3226.00 samples/sec Loss 3.6517 LearningRate 0.0239 Epoch: 10 Global Step: 126930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:42:51,554-Speed 3314.76 samples/sec Loss 3.6263 LearningRate 0.0239 Epoch: 10 Global Step: 126940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:42:54,643-Speed 3316.29 samples/sec Loss 3.6564 LearningRate 0.0239 Epoch: 10 Global Step: 126950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:42:57,728-Speed 3320.17 samples/sec Loss 3.6346 LearningRate 0.0239 Epoch: 10 Global Step: 126960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:00,825-Speed 3307.93 samples/sec Loss 3.6171 LearningRate 0.0239 Epoch: 10 Global Step: 126970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:03,926-Speed 3302.62 samples/sec Loss 3.6563 LearningRate 0.0239 Epoch: 10 Global Step: 126980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:07,013-Speed 3317.88 samples/sec Loss 3.6626 LearningRate 0.0239 Epoch: 10 Global Step: 126990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:10,063-Speed 3359.42 samples/sec Loss 3.6975 LearningRate 0.0239 Epoch: 10 Global Step: 127000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:13,144-Speed 3324.47 samples/sec Loss 3.7069 LearningRate 0.0239 Epoch: 10 Global Step: 127010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:16,231-Speed 3318.29 samples/sec Loss 3.6891 LearningRate 0.0239 Epoch: 10 Global Step: 127020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:19,310-Speed 3326.37 samples/sec Loss 3.7005 LearningRate 0.0239 Epoch: 10 Global Step: 127030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:22,388-Speed 3328.47 samples/sec Loss 3.7700 LearningRate 0.0239 Epoch: 10 Global Step: 127040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:25,540-Speed 3249.69 samples/sec Loss 3.6991 LearningRate 0.0239 Epoch: 10 Global Step: 127050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:28,693-Speed 3248.71 samples/sec Loss 3.7148 LearningRate 0.0239 Epoch: 10 Global Step: 127060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:31,832-Speed 3263.14 samples/sec Loss 3.6598 LearningRate 0.0239 Epoch: 10 Global Step: 127070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:43:34,938-Speed 3298.04 samples/sec Loss 3.6561 LearningRate 0.0239 Epoch: 10 Global Step: 127080 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:43:38,065-Speed 3275.02 samples/sec Loss 3.7333 LearningRate 0.0239 Epoch: 10 Global Step: 127090 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:43:41,161-Speed 3309.43 samples/sec Loss 3.7402 LearningRate 0.0239 Epoch: 10 Global Step: 127100 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:43:44,303-Speed 3259.23 samples/sec Loss 3.7204 LearningRate 0.0238 Epoch: 10 Global Step: 127110 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:43:47,397-Speed 3310.53 samples/sec Loss 3.6245 LearningRate 0.0238 Epoch: 10 Global Step: 127120 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:43:50,487-Speed 3315.06 samples/sec Loss 3.6499 LearningRate 0.0238 Epoch: 10 Global Step: 127130 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:43:53,618-Speed 3272.35 samples/sec Loss 3.6873 LearningRate 0.0238 Epoch: 10 Global Step: 127140 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:43:56,713-Speed 3309.45 samples/sec Loss 3.6267 LearningRate 0.0238 Epoch: 10 Global Step: 127150 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:43:59,818-Speed 3298.33 samples/sec Loss 3.7563 LearningRate 0.0238 Epoch: 10 Global Step: 127160 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:44:02,917-Speed 3305.64 samples/sec Loss 3.6301 LearningRate 0.0238 Epoch: 10 Global Step: 127170 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:44:06,065-Speed 3254.16 samples/sec Loss 3.6183 LearningRate 0.0238 Epoch: 10 Global Step: 127180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:09,175-Speed 3292.80 samples/sec Loss 3.6785 LearningRate 0.0238 Epoch: 10 Global Step: 127190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:12,266-Speed 3314.23 samples/sec Loss 3.7136 LearningRate 0.0238 Epoch: 10 Global Step: 127200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:15,375-Speed 3295.32 samples/sec Loss 3.6855 LearningRate 0.0238 Epoch: 10 Global Step: 127210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:18,518-Speed 3259.05 samples/sec Loss 3.6956 LearningRate 0.0238 Epoch: 10 Global Step: 127220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:21,602-Speed 3321.53 samples/sec Loss 3.6599 LearningRate 0.0238 Epoch: 10 Global Step: 127230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:24,844-Speed 3159.37 samples/sec Loss 3.8049 LearningRate 0.0238 Epoch: 10 Global Step: 127240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:27,985-Speed 3261.31 samples/sec Loss 3.5800 LearningRate 0.0238 Epoch: 10 Global Step: 127250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:31,155-Speed 3230.89 samples/sec Loss 3.7151 LearningRate 0.0238 Epoch: 10 Global Step: 127260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:34,225-Speed 3336.42 samples/sec Loss 3.6949 LearningRate 0.0238 Epoch: 10 Global Step: 127270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:37,311-Speed 3319.24 samples/sec Loss 3.6228 LearningRate 0.0238 Epoch: 10 Global Step: 127280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:40,525-Speed 3187.12 samples/sec Loss 3.6924 LearningRate 0.0238 Epoch: 10 Global Step: 127290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:43,616-Speed 3313.57 samples/sec Loss 3.6452 LearningRate 0.0238 Epoch: 10 Global Step: 127300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:44:46,691-Speed 3330.65 samples/sec Loss 3.6739 LearningRate 0.0238 Epoch: 10 Global Step: 127310 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:44:49,781-Speed 3315.30 samples/sec Loss 3.6889 LearningRate 0.0238 Epoch: 10 Global Step: 127320 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:44:52,901-Speed 3283.13 samples/sec Loss 3.7707 LearningRate 0.0238 Epoch: 10 Global Step: 127330 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:44:55,971-Speed 3337.04 samples/sec Loss 3.7576 LearningRate 0.0238 Epoch: 10 Global Step: 127340 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:44:59,105-Speed 3267.83 samples/sec Loss 3.7677 LearningRate 0.0238 Epoch: 10 Global Step: 127350 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:02,221-Speed 3288.15 samples/sec Loss 3.6750 LearningRate 0.0237 Epoch: 10 Global Step: 127360 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:05,344-Speed 3279.17 samples/sec Loss 3.6880 LearningRate 0.0237 Epoch: 10 Global Step: 127370 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:08,443-Speed 3305.40 samples/sec Loss 3.6666 LearningRate 0.0237 Epoch: 10 Global Step: 127380 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:11,542-Speed 3305.83 samples/sec Loss 3.7599 LearningRate 0.0237 Epoch: 10 Global Step: 127390 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:14,689-Speed 3254.54 samples/sec Loss 3.7611 LearningRate 0.0237 Epoch: 10 Global Step: 127400 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:17,783-Speed 3311.23 samples/sec Loss 3.6517 LearningRate 0.0237 Epoch: 10 Global Step: 127410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:45:20,882-Speed 3305.56 samples/sec Loss 3.6029 LearningRate 0.0237 Epoch: 10 Global Step: 127420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:45:23,999-Speed 3285.91 samples/sec Loss 3.7519 LearningRate 0.0237 Epoch: 10 Global Step: 127430 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:27,098-Speed 3305.11 samples/sec Loss 3.6918 LearningRate 0.0237 Epoch: 10 Global Step: 127440 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:30,208-Speed 3294.13 samples/sec Loss 3.7679 LearningRate 0.0237 Epoch: 10 Global Step: 127450 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:33,301-Speed 3311.85 samples/sec Loss 3.7138 LearningRate 0.0237 Epoch: 10 Global Step: 127460 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:36,404-Speed 3300.94 samples/sec Loss 3.7099 LearningRate 0.0237 Epoch: 10 Global Step: 127470 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:39,603-Speed 3202.14 samples/sec Loss 3.7295 LearningRate 0.0237 Epoch: 10 Global Step: 127480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:42,734-Speed 3271.04 samples/sec Loss 3.6810 LearningRate 0.0237 Epoch: 10 Global Step: 127490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:45,820-Speed 3319.63 samples/sec Loss 3.5438 LearningRate 0.0237 Epoch: 10 Global Step: 127500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:48,923-Speed 3301.24 samples/sec Loss 3.7153 LearningRate 0.0237 Epoch: 10 Global Step: 127510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:52,080-Speed 3243.85 samples/sec Loss 3.6059 LearningRate 0.0237 Epoch: 10 Global Step: 127520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:45:55,242-Speed 3239.94 samples/sec Loss 3.6889 LearningRate 0.0237 Epoch: 10 Global Step: 127530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:45:58,333-Speed 3314.41 samples/sec Loss 3.8115 LearningRate 0.0237 Epoch: 10 Global Step: 127540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:01,455-Speed 3280.09 samples/sec Loss 3.7152 LearningRate 0.0237 Epoch: 10 Global Step: 127550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:04,598-Speed 3259.28 samples/sec Loss 3.7302 LearningRate 0.0237 Epoch: 10 Global Step: 127560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:07,739-Speed 3261.32 samples/sec Loss 3.7160 LearningRate 0.0237 Epoch: 10 Global Step: 127570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:10,833-Speed 3310.81 samples/sec Loss 3.6838 LearningRate 0.0237 Epoch: 10 Global Step: 127580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:13,916-Speed 3322.87 samples/sec Loss 3.7103 LearningRate 0.0237 Epoch: 10 Global Step: 127590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:17,006-Speed 3314.65 samples/sec Loss 3.7308 LearningRate 0.0237 Epoch: 10 Global Step: 127600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:20,069-Speed 3344.11 samples/sec Loss 3.6849 LearningRate 0.0237 Epoch: 10 Global Step: 127610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:23,177-Speed 3296.27 samples/sec Loss 3.7735 LearningRate 0.0236 Epoch: 10 Global Step: 127620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:26,350-Speed 3227.95 samples/sec Loss 3.7050 LearningRate 0.0236 Epoch: 10 Global Step: 127630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:46:29,540-Speed 3211.42 samples/sec Loss 3.7872 LearningRate 0.0236 Epoch: 10 Global Step: 127640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:46:32,634-Speed 3310.12 samples/sec Loss 3.7354 LearningRate 0.0236 Epoch: 10 Global Step: 127650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:35,724-Speed 3315.17 samples/sec Loss 3.7464 LearningRate 0.0236 Epoch: 10 Global Step: 127660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:38,878-Speed 3247.47 samples/sec Loss 3.7340 LearningRate 0.0236 Epoch: 10 Global Step: 127670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:41,963-Speed 3321.08 samples/sec Loss 3.7461 LearningRate 0.0236 Epoch: 10 Global Step: 127680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:45,042-Speed 3326.12 samples/sec Loss 3.7460 LearningRate 0.0236 Epoch: 10 Global Step: 127690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:48,161-Speed 3284.04 samples/sec Loss 3.6932 LearningRate 0.0236 Epoch: 10 Global Step: 127700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:46:51,279-Speed 3285.14 samples/sec Loss 3.7332 LearningRate 0.0236 Epoch: 10 Global Step: 127710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:46:54,404-Speed 3277.71 samples/sec Loss 3.7654 LearningRate 0.0236 Epoch: 10 Global Step: 127720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:46:57,530-Speed 3276.68 samples/sec Loss 3.7145 LearningRate 0.0236 Epoch: 10 Global Step: 127730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:47:00,636-Speed 3297.65 samples/sec Loss 3.7007 LearningRate 0.0236 Epoch: 10 Global Step: 127740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:47:03,752-Speed 3287.60 samples/sec Loss 3.7644 LearningRate 0.0236 Epoch: 10 Global Step: 127750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:47:06,826-Speed 3332.63 samples/sec Loss 3.7251 LearningRate 0.0236 Epoch: 10 Global Step: 127760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:47:09,902-Speed 3329.96 samples/sec Loss 3.6969 LearningRate 0.0236 Epoch: 10 Global Step: 127770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:47:13,020-Speed 3285.43 samples/sec Loss 3.7622 LearningRate 0.0236 Epoch: 10 Global Step: 127780 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:47:16,140-Speed 3282.64 samples/sec Loss 3.7521 LearningRate 0.0236 Epoch: 10 Global Step: 127790 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:47:19,288-Speed 3254.20 samples/sec Loss 3.7874 LearningRate 0.0236 Epoch: 10 Global Step: 127800 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:47:22,419-Speed 3271.34 samples/sec Loss 3.7486 LearningRate 0.0236 Epoch: 10 Global Step: 127810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:47:25,517-Speed 3306.89 samples/sec Loss 3.7587 LearningRate 0.0236 Epoch: 10 Global Step: 127820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:47:28,661-Speed 3257.80 samples/sec Loss 3.7843 LearningRate 0.0236 Epoch: 10 Global Step: 127830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:47:31,771-Speed 3293.99 samples/sec Loss 3.7176 LearningRate 0.0236 Epoch: 10 Global Step: 127840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:47:34,845-Speed 3332.29 samples/sec Loss 3.7402 LearningRate 0.0236 Epoch: 10 Global Step: 127850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:47:37,925-Speed 3325.78 samples/sec Loss 3.7052 LearningRate 0.0236 Epoch: 10 Global Step: 127860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:47:41,026-Speed 3302.75 samples/sec Loss 3.6566 LearningRate 0.0235 Epoch: 10 Global Step: 127870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:47:44,160-Speed 3268.65 samples/sec Loss 3.7601 LearningRate 0.0235 Epoch: 10 Global Step: 127880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:47:47,274-Speed 3289.28 samples/sec Loss 3.8932 LearningRate 0.0235 Epoch: 10 Global Step: 127890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:47:50,379-Speed 3298.98 samples/sec Loss 3.7136 LearningRate 0.0235 Epoch: 10 Global Step: 127900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:47:53,528-Speed 3252.79 samples/sec Loss 3.7245 LearningRate 0.0235 Epoch: 10 Global Step: 127910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:47:56,575-Speed 3361.65 samples/sec Loss 3.7328 LearningRate 0.0235 Epoch: 10 Global Step: 127920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:47:59,745-Speed 3231.28 samples/sec Loss 3.7873 LearningRate 0.0235 Epoch: 10 Global Step: 127930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:02,936-Speed 3210.34 samples/sec Loss 3.6968 LearningRate 0.0235 Epoch: 10 Global Step: 127940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:06,124-Speed 3212.45 samples/sec Loss 3.8140 LearningRate 0.0235 Epoch: 10 Global Step: 127950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:09,211-Speed 3318.37 samples/sec Loss 3.7234 LearningRate 0.0235 Epoch: 10 Global Step: 127960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:12,452-Speed 3160.93 samples/sec Loss 3.7781 LearningRate 0.0235 Epoch: 10 Global Step: 127970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:15,601-Speed 3252.60 samples/sec Loss 3.6861 LearningRate 0.0235 Epoch: 10 Global Step: 127980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:18,752-Speed 3250.89 samples/sec Loss 3.7990 LearningRate 0.0235 Epoch: 10 Global Step: 127990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:21,855-Speed 3300.81 samples/sec Loss 3.7581 LearningRate 0.0235 Epoch: 10 Global Step: 128000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:25,000-Speed 3256.58 samples/sec Loss 3.8034 LearningRate 0.0235 Epoch: 10 Global Step: 128010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:28,101-Speed 3303.10 samples/sec Loss 3.8114 LearningRate 0.0235 Epoch: 10 Global Step: 128020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:48:31,220-Speed 3284.23 samples/sec Loss 3.8173 LearningRate 0.0235 Epoch: 10 Global Step: 128030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:48:34,352-Speed 3271.31 samples/sec Loss 3.7872 LearningRate 0.0235 Epoch: 10 Global Step: 128040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:37,438-Speed 3319.08 samples/sec Loss 3.6613 LearningRate 0.0235 Epoch: 10 Global Step: 128050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:40,553-Speed 3287.97 samples/sec Loss 3.8302 LearningRate 0.0235 Epoch: 10 Global Step: 128060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:43,648-Speed 3309.68 samples/sec Loss 3.7696 LearningRate 0.0235 Epoch: 10 Global Step: 128070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:46,730-Speed 3323.58 samples/sec Loss 3.8566 LearningRate 0.0235 Epoch: 10 Global Step: 128080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:49,868-Speed 3264.20 samples/sec Loss 3.8196 LearningRate 0.0235 Epoch: 10 Global Step: 128090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:53,122-Speed 3147.93 samples/sec Loss 3.7215 LearningRate 0.0235 Epoch: 10 Global Step: 128100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:56,280-Speed 3244.19 samples/sec Loss 3.7147 LearningRate 0.0235 Epoch: 10 Global Step: 128110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:48:59,345-Speed 3341.72 samples/sec Loss 3.8471 LearningRate 0.0235 Epoch: 10 Global Step: 128120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:02,437-Speed 3312.62 samples/sec Loss 3.7650 LearningRate 0.0234 Epoch: 10 Global Step: 128130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:05,589-Speed 3249.61 samples/sec Loss 3.7659 LearningRate 0.0234 Epoch: 10 Global Step: 128140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:49:08,729-Speed 3262.67 samples/sec Loss 3.7680 LearningRate 0.0234 Epoch: 10 Global Step: 128150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:49:11,810-Speed 3324.42 samples/sec Loss 3.7809 LearningRate 0.0234 Epoch: 10 Global Step: 128160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:49:14,985-Speed 3226.47 samples/sec Loss 3.7199 LearningRate 0.0234 Epoch: 10 Global Step: 128170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:49:18,125-Speed 3261.91 samples/sec Loss 3.7406 LearningRate 0.0234 Epoch: 10 Global Step: 128180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:49:21,193-Speed 3338.58 samples/sec Loss 3.7982 LearningRate 0.0234 Epoch: 10 Global Step: 128190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:49:24,272-Speed 3327.12 samples/sec Loss 3.7502 LearningRate 0.0234 Epoch: 10 Global Step: 128200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:49:27,316-Speed 3365.47 samples/sec Loss 3.7969 LearningRate 0.0234 Epoch: 10 Global Step: 128210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:30,492-Speed 3224.79 samples/sec Loss 3.8076 LearningRate 0.0234 Epoch: 10 Global Step: 128220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:33,598-Speed 3297.71 samples/sec Loss 3.8186 LearningRate 0.0234 Epoch: 10 Global Step: 128230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:36,727-Speed 3273.89 samples/sec Loss 3.7770 LearningRate 0.0234 Epoch: 10 Global Step: 128240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:39,868-Speed 3261.60 samples/sec Loss 3.8218 LearningRate 0.0234 Epoch: 10 Global Step: 128250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:43,003-Speed 3267.01 samples/sec Loss 3.8460 LearningRate 0.0234 Epoch: 10 Global Step: 128260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:46,066-Speed 3344.40 samples/sec Loss 3.7495 LearningRate 0.0234 Epoch: 10 Global Step: 128270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:49,200-Speed 3267.65 samples/sec Loss 3.6818 LearningRate 0.0234 Epoch: 10 Global Step: 128280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:52,334-Speed 3269.51 samples/sec Loss 3.8581 LearningRate 0.0234 Epoch: 10 Global Step: 128290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:55,409-Speed 3331.52 samples/sec Loss 3.7808 LearningRate 0.0234 Epoch: 10 Global Step: 128300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:49:58,500-Speed 3312.83 samples/sec Loss 3.8327 LearningRate 0.0234 Epoch: 10 Global Step: 128310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:50:01,643-Speed 3259.35 samples/sec Loss 3.7786 LearningRate 0.0234 Epoch: 10 Global Step: 128320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:50:04,841-Speed 3203.48 samples/sec Loss 3.8686 LearningRate 0.0234 Epoch: 10 Global Step: 128330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:50:07,889-Speed 3359.94 samples/sec Loss 3.8362 LearningRate 0.0234 Epoch: 10 Global Step: 128340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:50:10,926-Speed 3373.43 samples/sec Loss 3.8726 LearningRate 0.0234 Epoch: 10 Global Step: 128350 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:50:14,073-Speed 3254.50 samples/sec Loss 3.7942 LearningRate 0.0234 Epoch: 10 Global Step: 128360 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:50:17,286-Speed 3188.35 samples/sec Loss 3.8330 LearningRate 0.0234 Epoch: 10 Global Step: 128370 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:50:20,383-Speed 3306.54 samples/sec Loss 3.8647 LearningRate 0.0233 Epoch: 10 Global Step: 128380 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:50:23,497-Speed 3289.45 samples/sec Loss 3.7671 LearningRate 0.0233 Epoch: 10 Global Step: 128390 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:50:26,613-Speed 3288.02 samples/sec Loss 3.7557 LearningRate 0.0233 Epoch: 10 Global Step: 128400 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:50:29,814-Speed 3199.20 samples/sec Loss 3.7371 LearningRate 0.0233 Epoch: 10 Global Step: 128410 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:50:32,964-Speed 3251.89 samples/sec Loss 3.8818 LearningRate 0.0233 Epoch: 10 Global Step: 128420 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:50:36,042-Speed 3328.39 samples/sec Loss 3.8631 LearningRate 0.0233 Epoch: 10 Global Step: 128430 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:50:39,177-Speed 3267.36 samples/sec Loss 3.8444 LearningRate 0.0233 Epoch: 10 Global Step: 128440 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:50:42,309-Speed 3270.75 samples/sec Loss 3.8520 LearningRate 0.0233 Epoch: 10 Global Step: 128450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:50:45,374-Speed 3341.81 samples/sec Loss 3.7969 LearningRate 0.0233 Epoch: 10 Global Step: 128460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:50:48,487-Speed 3289.96 samples/sec Loss 3.8759 LearningRate 0.0233 Epoch: 10 Global Step: 128470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:50:51,595-Speed 3296.16 samples/sec Loss 3.8114 LearningRate 0.0233 Epoch: 10 Global Step: 128480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:50:54,740-Speed 3257.14 samples/sec Loss 3.7409 LearningRate 0.0233 Epoch: 10 Global Step: 128490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:50:57,810-Speed 3336.97 samples/sec Loss 3.8628 LearningRate 0.0233 Epoch: 10 Global Step: 128500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:00,881-Speed 3335.04 samples/sec Loss 3.8047 LearningRate 0.0233 Epoch: 10 Global Step: 128510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:03,979-Speed 3305.99 samples/sec Loss 3.7693 LearningRate 0.0233 Epoch: 10 Global Step: 128520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:07,102-Speed 3280.31 samples/sec Loss 3.7891 LearningRate 0.0233 Epoch: 10 Global Step: 128530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:10,165-Speed 3344.33 samples/sec Loss 3.7904 LearningRate 0.0233 Epoch: 10 Global Step: 128540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:13,265-Speed 3303.97 samples/sec Loss 3.7719 LearningRate 0.0233 Epoch: 10 Global Step: 128550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:51:16,359-Speed 3310.36 samples/sec Loss 3.7773 LearningRate 0.0233 Epoch: 10 Global Step: 128560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:19,466-Speed 3297.41 samples/sec Loss 3.6752 LearningRate 0.0233 Epoch: 10 Global Step: 128570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:22,543-Speed 3329.56 samples/sec Loss 3.8155 LearningRate 0.0233 Epoch: 10 Global Step: 128580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:25,648-Speed 3298.37 samples/sec Loss 3.8267 LearningRate 0.0233 Epoch: 10 Global Step: 128590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:28,716-Speed 3339.40 samples/sec Loss 3.7180 LearningRate 0.0233 Epoch: 10 Global Step: 128600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:31,787-Speed 3335.44 samples/sec Loss 3.8051 LearningRate 0.0233 Epoch: 10 Global Step: 128610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:34,854-Speed 3339.61 samples/sec Loss 3.8011 LearningRate 0.0233 Epoch: 10 Global Step: 128620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:37,914-Speed 3347.25 samples/sec Loss 3.8102 LearningRate 0.0233 Epoch: 10 Global Step: 128630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:40,985-Speed 3335.07 samples/sec Loss 3.8049 LearningRate 0.0232 Epoch: 10 Global Step: 128640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:44,102-Speed 3286.92 samples/sec Loss 3.8014 LearningRate 0.0232 Epoch: 10 Global Step: 128650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:47,230-Speed 3274.23 samples/sec Loss 3.8319 LearningRate 0.0232 Epoch: 10 Global Step: 128660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:51:50,374-Speed 3259.04 samples/sec Loss 3.7001 LearningRate 0.0232 Epoch: 10 Global Step: 128670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:51:53,489-Speed 3287.58 samples/sec Loss 3.8010 LearningRate 0.0232 Epoch: 10 Global Step: 128680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:51:56,568-Speed 3327.45 samples/sec Loss 3.7193 LearningRate 0.0232 Epoch: 10 Global Step: 128690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:51:59,701-Speed 3269.61 samples/sec Loss 3.8152 LearningRate 0.0232 Epoch: 10 Global Step: 128700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:52:02,842-Speed 3260.79 samples/sec Loss 3.7980 LearningRate 0.0232 Epoch: 10 Global Step: 128710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:52:05,956-Speed 3289.47 samples/sec Loss 3.7785 LearningRate 0.0232 Epoch: 10 Global Step: 128720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:52:09,019-Speed 3344.62 samples/sec Loss 3.8519 LearningRate 0.0232 Epoch: 10 Global Step: 128730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:52:12,124-Speed 3298.30 samples/sec Loss 3.7489 LearningRate 0.0232 Epoch: 10 Global Step: 128740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:52:15,203-Speed 3327.85 samples/sec Loss 3.7510 LearningRate 0.0232 Epoch: 10 Global Step: 128750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:52:18,278-Speed 3330.61 samples/sec Loss 3.8602 LearningRate 0.0232 Epoch: 10 Global Step: 128760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:52:21,356-Speed 3328.49 samples/sec Loss 3.8495 LearningRate 0.0232 Epoch: 10 Global Step: 128770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:52:24,452-Speed 3308.03 samples/sec Loss 3.8375 LearningRate 0.0232 Epoch: 10 Global Step: 128780 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:52:27,612-Speed 3241.38 samples/sec Loss 3.8349 LearningRate 0.0232 Epoch: 10 Global Step: 128790 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:52:30,696-Speed 3322.11 samples/sec Loss 3.7733 LearningRate 0.0232 Epoch: 10 Global Step: 128800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:52:33,753-Speed 3350.07 samples/sec Loss 3.7879 LearningRate 0.0232 Epoch: 10 Global Step: 128810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:52:36,860-Speed 3297.50 samples/sec Loss 3.8613 LearningRate 0.0232 Epoch: 10 Global Step: 128820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:52:39,965-Speed 3298.62 samples/sec Loss 3.7310 LearningRate 0.0232 Epoch: 10 Global Step: 128830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:52:43,071-Speed 3298.31 samples/sec Loss 3.8361 LearningRate 0.0232 Epoch: 10 Global Step: 128840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:52:46,166-Speed 3308.41 samples/sec Loss 3.7843 LearningRate 0.0232 Epoch: 10 Global Step: 128850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:52:49,302-Speed 3267.00 samples/sec Loss 3.8266 LearningRate 0.0232 Epoch: 10 Global Step: 128860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:52:52,427-Speed 3277.90 samples/sec Loss 3.8286 LearningRate 0.0232 Epoch: 10 Global Step: 128870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:52:55,498-Speed 3335.14 samples/sec Loss 3.8382 LearningRate 0.0232 Epoch: 10 Global Step: 128880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:52:58,565-Speed 3339.51 samples/sec Loss 3.8092 LearningRate 0.0232 Epoch: 10 Global Step: 128890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:53:01,642-Speed 3329.12 samples/sec Loss 3.8398 LearningRate 0.0231 Epoch: 10 Global Step: 128900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:53:04,724-Speed 3323.48 samples/sec Loss 3.7945 LearningRate 0.0231 Epoch: 10 Global Step: 128910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:53:07,817-Speed 3312.43 samples/sec Loss 3.7780 LearningRate 0.0231 Epoch: 10 Global Step: 128920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:53:10,914-Speed 3306.65 samples/sec Loss 3.8160 LearningRate 0.0231 Epoch: 10 Global Step: 128930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:53:14,039-Speed 3278.12 samples/sec Loss 3.8571 LearningRate 0.0231 Epoch: 10 Global Step: 128940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:53:17,177-Speed 3264.36 samples/sec Loss 3.8354 LearningRate 0.0231 Epoch: 10 Global Step: 128950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:53:20,256-Speed 3327.79 samples/sec Loss 3.8081 LearningRate 0.0231 Epoch: 10 Global Step: 128960 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:53:23,409-Speed 3248.12 samples/sec Loss 3.7629 LearningRate 0.0231 Epoch: 10 Global Step: 128970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:53:26,535-Speed 3277.18 samples/sec Loss 3.7547 LearningRate 0.0231 Epoch: 10 Global Step: 128980 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:53:29,662-Speed 3275.52 samples/sec Loss 3.7762 LearningRate 0.0231 Epoch: 10 Global Step: 128990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:53:32,752-Speed 3314.92 samples/sec Loss 3.7952 LearningRate 0.0231 Epoch: 10 Global Step: 129000 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:53:35,920-Speed 3233.85 samples/sec Loss 3.8773 LearningRate 0.0231 Epoch: 10 Global Step: 129010 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:53:39,048-Speed 3273.91 samples/sec Loss 3.9231 LearningRate 0.0231 Epoch: 10 Global Step: 129020 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:53:42,185-Speed 3265.36 samples/sec Loss 3.7902 LearningRate 0.0231 Epoch: 10 Global Step: 129030 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:53:45,288-Speed 3301.49 samples/sec Loss 3.6934 LearningRate 0.0231 Epoch: 10 Global Step: 129040 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:53:48,390-Speed 3301.64 samples/sec Loss 3.8443 LearningRate 0.0231 Epoch: 10 Global Step: 129050 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:53:51,536-Speed 3255.71 samples/sec Loss 3.7716 LearningRate 0.0231 Epoch: 10 Global Step: 129060 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:53:54,710-Speed 3228.05 samples/sec Loss 3.7850 LearningRate 0.0231 Epoch: 10 Global Step: 129070 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:53:57,830-Speed 3282.34 samples/sec Loss 3.7483 LearningRate 0.0231 Epoch: 10 Global Step: 129080 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:54:00,982-Speed 3250.04 samples/sec Loss 3.8315 LearningRate 0.0231 Epoch: 10 Global Step: 129090 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:54:04,119-Speed 3265.18 samples/sec Loss 3.8524 LearningRate 0.0231 Epoch: 10 Global Step: 129100 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:07,280-Speed 3240.56 samples/sec Loss 3.8754 LearningRate 0.0231 Epoch: 10 Global Step: 129110 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:10,391-Speed 3292.80 samples/sec Loss 3.7767 LearningRate 0.0231 Epoch: 10 Global Step: 129120 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:13,487-Speed 3308.42 samples/sec Loss 3.8545 LearningRate 0.0231 Epoch: 10 Global Step: 129130 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:16,593-Speed 3298.10 samples/sec Loss 3.8097 LearningRate 0.0231 Epoch: 10 Global Step: 129140 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:19,722-Speed 3273.77 samples/sec Loss 3.8151 LearningRate 0.0231 Epoch: 10 Global Step: 129150 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:22,838-Speed 3287.42 samples/sec Loss 3.8559 LearningRate 0.0230 Epoch: 10 Global Step: 129160 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:25,986-Speed 3253.94 samples/sec Loss 3.7173 LearningRate 0.0230 Epoch: 10 Global Step: 129170 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:29,205-Speed 3182.26 samples/sec Loss 3.7182 LearningRate 0.0230 Epoch: 10 Global Step: 129180 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:32,356-Speed 3250.92 samples/sec Loss 3.9110 LearningRate 0.0230 Epoch: 10 Global Step: 129190 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:35,491-Speed 3266.89 samples/sec Loss 3.8658 LearningRate 0.0230 Epoch: 10 Global Step: 129200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:54:38,614-Speed 3280.52 samples/sec Loss 3.8962 LearningRate 0.0230 Epoch: 10 Global Step: 129210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:54:41,718-Speed 3299.64 samples/sec Loss 3.8123 LearningRate 0.0230 Epoch: 10 Global Step: 129220 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:44,810-Speed 3312.67 samples/sec Loss 3.7922 LearningRate 0.0230 Epoch: 10 Global Step: 129230 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:47,912-Speed 3302.09 samples/sec Loss 3.8323 LearningRate 0.0230 Epoch: 10 Global Step: 129240 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:50,980-Speed 3338.70 samples/sec Loss 3.8768 LearningRate 0.0230 Epoch: 10 Global Step: 129250 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:54,107-Speed 3276.05 samples/sec Loss 3.7983 LearningRate 0.0230 Epoch: 10 Global Step: 129260 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:54:57,190-Speed 3322.24 samples/sec Loss 3.7994 LearningRate 0.0230 Epoch: 10 Global Step: 129270 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:55:00,299-Speed 3295.48 samples/sec Loss 3.8623 LearningRate 0.0230 Epoch: 10 Global Step: 129280 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:55:03,472-Speed 3228.33 samples/sec Loss 3.8962 LearningRate 0.0230 Epoch: 10 Global Step: 129290 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:55:06,593-Speed 3281.58 samples/sec Loss 3.8844 LearningRate 0.0230 Epoch: 10 Global Step: 129300 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:55:09,661-Speed 3339.15 samples/sec Loss 3.8056 LearningRate 0.0230 Epoch: 10 Global Step: 129310 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:55:12,869-Speed 3192.67 samples/sec Loss 3.8119 LearningRate 0.0230 Epoch: 10 Global Step: 129320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:55:15,949-Speed 3326.73 samples/sec Loss 3.8341 LearningRate 0.0230 Epoch: 10 Global Step: 129330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:55:19,026-Speed 3328.24 samples/sec Loss 3.9031 LearningRate 0.0230 Epoch: 10 Global Step: 129340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:55:22,111-Speed 3320.50 samples/sec Loss 3.8570 LearningRate 0.0230 Epoch: 10 Global Step: 129350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:55:25,240-Speed 3274.41 samples/sec Loss 3.9293 LearningRate 0.0230 Epoch: 10 Global Step: 129360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:55:28,406-Speed 3234.77 samples/sec Loss 3.7500 LearningRate 0.0230 Epoch: 10 Global Step: 129370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:55:31,527-Speed 3282.21 samples/sec Loss 3.8633 LearningRate 0.0230 Epoch: 10 Global Step: 129380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:55:34,597-Speed 3336.39 samples/sec Loss 3.8398 LearningRate 0.0230 Epoch: 10 Global Step: 129390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:55:37,773-Speed 3225.04 samples/sec Loss 3.9591 LearningRate 0.0230 Epoch: 10 Global Step: 129400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:55:40,979-Speed 3195.23 samples/sec Loss 3.8722 LearningRate 0.0230 Epoch: 10 Global Step: 129410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:55:44,022-Speed 3367.20 samples/sec Loss 3.7885 LearningRate 0.0229 Epoch: 10 Global Step: 129420 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:55:47,129-Speed 3296.49 samples/sec Loss 3.8884 LearningRate 0.0229 Epoch: 10 Global Step: 129430 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:55:50,235-Speed 3297.73 samples/sec Loss 3.8326 LearningRate 0.0229 Epoch: 10 Global Step: 129440 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:55:53,349-Speed 3289.86 samples/sec Loss 3.8384 LearningRate 0.0229 Epoch: 10 Global Step: 129450 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:55:56,478-Speed 3272.96 samples/sec Loss 3.8236 LearningRate 0.0229 Epoch: 10 Global Step: 129460 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:55:59,549-Speed 3336.12 samples/sec Loss 3.8177 LearningRate 0.0229 Epoch: 10 Global Step: 129470 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:56:02,668-Speed 3283.71 samples/sec Loss 3.8881 LearningRate 0.0229 Epoch: 10 Global Step: 129480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:56:05,783-Speed 3288.60 samples/sec Loss 3.8318 LearningRate 0.0229 Epoch: 10 Global Step: 129490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:56:08,840-Speed 3350.24 samples/sec Loss 3.8534 LearningRate 0.0229 Epoch: 10 Global Step: 129500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:56:11,910-Speed 3336.67 samples/sec Loss 3.8558 LearningRate 0.0229 Epoch: 10 Global Step: 129510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:56:15,044-Speed 3268.94 samples/sec Loss 3.8019 LearningRate 0.0229 Epoch: 10 Global Step: 129520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:56:18,153-Speed 3295.52 samples/sec Loss 3.7792 LearningRate 0.0229 Epoch: 10 Global Step: 129530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:56:21,254-Speed 3302.10 samples/sec Loss 3.7840 LearningRate 0.0229 Epoch: 10 Global Step: 129540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:56:24,351-Speed 3308.23 samples/sec Loss 3.9000 LearningRate 0.0229 Epoch: 10 Global Step: 129550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:56:27,472-Speed 3281.92 samples/sec Loss 3.7857 LearningRate 0.0229 Epoch: 10 Global Step: 129560 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:56:30,585-Speed 3290.52 samples/sec Loss 3.8285 LearningRate 0.0229 Epoch: 10 Global Step: 129570 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:56:33,650-Speed 3342.49 samples/sec Loss 3.9815 LearningRate 0.0229 Epoch: 10 Global Step: 129580 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:56:36,770-Speed 3282.61 samples/sec Loss 3.9096 LearningRate 0.0229 Epoch: 10 Global Step: 129590 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:56:39,854-Speed 3322.20 samples/sec Loss 3.9519 LearningRate 0.0229 Epoch: 10 Global Step: 129600 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:56:42,971-Speed 3286.29 samples/sec Loss 3.8325 LearningRate 0.0229 Epoch: 10 Global Step: 129610 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:56:46,031-Speed 3346.42 samples/sec Loss 3.8142 LearningRate 0.0229 Epoch: 10 Global Step: 129620 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:56:49,136-Speed 3299.08 samples/sec Loss 3.8485 LearningRate 0.0229 Epoch: 10 Global Step: 129630 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:56:52,255-Speed 3284.31 samples/sec Loss 3.9735 LearningRate 0.0229 Epoch: 10 Global Step: 129640 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:56:55,343-Speed 3317.66 samples/sec Loss 3.8533 LearningRate 0.0229 Epoch: 10 Global Step: 129650 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 12:56:58,467-Speed 3278.91 samples/sec Loss 3.8630 LearningRate 0.0229 Epoch: 10 Global Step: 129660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:01,558-Speed 3313.67 samples/sec Loss 3.8600 LearningRate 0.0229 Epoch: 10 Global Step: 129670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:04,757-Speed 3202.09 samples/sec Loss 3.8396 LearningRate 0.0228 Epoch: 10 Global Step: 129680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:07,886-Speed 3272.88 samples/sec Loss 3.8434 LearningRate 0.0228 Epoch: 10 Global Step: 129690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:10,943-Speed 3351.60 samples/sec Loss 3.8957 LearningRate 0.0228 Epoch: 10 Global Step: 129700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:14,047-Speed 3299.18 samples/sec Loss 3.9026 LearningRate 0.0228 Epoch: 10 Global Step: 129710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:17,224-Speed 3224.60 samples/sec Loss 3.9523 LearningRate 0.0228 Epoch: 10 Global Step: 129720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:20,338-Speed 3288.86 samples/sec Loss 3.8808 LearningRate 0.0228 Epoch: 10 Global Step: 129730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:23,444-Speed 3298.54 samples/sec Loss 3.8714 LearningRate 0.0228 Epoch: 10 Global Step: 129740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:26,560-Speed 3287.17 samples/sec Loss 3.7950 LearningRate 0.0228 Epoch: 10 Global Step: 129750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:29,708-Speed 3254.44 samples/sec Loss 3.9097 LearningRate 0.0228 Epoch: 10 Global Step: 129760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:57:32,830-Speed 3280.86 samples/sec Loss 3.8574 LearningRate 0.0228 Epoch: 10 Global Step: 129770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:57:35,909-Speed 3326.58 samples/sec Loss 3.9113 LearningRate 0.0228 Epoch: 10 Global Step: 129780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:57:39,013-Speed 3299.39 samples/sec Loss 3.7827 LearningRate 0.0228 Epoch: 10 Global Step: 129790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:57:42,121-Speed 3296.63 samples/sec Loss 3.8390 LearningRate 0.0228 Epoch: 10 Global Step: 129800 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:45,215-Speed 3311.08 samples/sec Loss 3.9263 LearningRate 0.0228 Epoch: 10 Global Step: 129810 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:48,275-Speed 3347.08 samples/sec Loss 3.8059 LearningRate 0.0228 Epoch: 10 Global Step: 129820 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:51,423-Speed 3253.50 samples/sec Loss 3.8816 LearningRate 0.0228 Epoch: 10 Global Step: 129830 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:54,542-Speed 3284.29 samples/sec Loss 3.7876 LearningRate 0.0228 Epoch: 10 Global Step: 129840 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:57:57,602-Speed 3347.24 samples/sec Loss 3.9069 LearningRate 0.0228 Epoch: 10 Global Step: 129850 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:58:00,759-Speed 3245.06 samples/sec Loss 3.9121 LearningRate 0.0228 Epoch: 10 Global Step: 129860 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:58:03,868-Speed 3294.59 samples/sec Loss 3.9323 LearningRate 0.0228 Epoch: 10 Global Step: 129870 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:58:06,983-Speed 3288.24 samples/sec Loss 3.8608 LearningRate 0.0228 Epoch: 10 Global Step: 129880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:58:10,084-Speed 3303.24 samples/sec Loss 4.0245 LearningRate 0.0228 Epoch: 10 Global Step: 129890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:58:13,236-Speed 3249.97 samples/sec Loss 3.8920 LearningRate 0.0228 Epoch: 10 Global Step: 129900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:58:16,309-Speed 3332.75 samples/sec Loss 3.8933 LearningRate 0.0228 Epoch: 10 Global Step: 129910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:58:19,454-Speed 3257.81 samples/sec Loss 3.9073 LearningRate 0.0228 Epoch: 10 Global Step: 129920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:58:22,561-Speed 3296.05 samples/sec Loss 3.9438 LearningRate 0.0228 Epoch: 10 Global Step: 129930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:58:25,716-Speed 3247.28 samples/sec Loss 3.8594 LearningRate 0.0227 Epoch: 10 Global Step: 129940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:58:28,802-Speed 3318.37 samples/sec Loss 3.8515 LearningRate 0.0227 Epoch: 10 Global Step: 129950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:58:31,902-Speed 3306.30 samples/sec Loss 3.8477 LearningRate 0.0227 Epoch: 10 Global Step: 129960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:58:35,043-Speed 3261.01 samples/sec Loss 3.8630 LearningRate 0.0227 Epoch: 10 Global Step: 129970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:58:38,114-Speed 3335.41 samples/sec Loss 3.8179 LearningRate 0.0227 Epoch: 10 Global Step: 129980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:58:41,251-Speed 3265.10 samples/sec Loss 3.7965 LearningRate 0.0227 Epoch: 10 Global Step: 129990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:58:44,356-Speed 3298.85 samples/sec Loss 3.9606 LearningRate 0.0227 Epoch: 10 Global Step: 130000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:58:47,445-Speed 3316.59 samples/sec Loss 3.7857 LearningRate 0.0227 Epoch: 10 Global Step: 130010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:58:50,553-Speed 3295.75 samples/sec Loss 3.9231 LearningRate 0.0227 Epoch: 10 Global Step: 130020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:58:53,626-Speed 3332.72 samples/sec Loss 3.8454 LearningRate 0.0227 Epoch: 10 Global Step: 130030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:58:56,688-Speed 3345.58 samples/sec Loss 3.9171 LearningRate 0.0227 Epoch: 10 Global Step: 130040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 12:58:59,804-Speed 3286.97 samples/sec Loss 3.8520 LearningRate 0.0227 Epoch: 10 Global Step: 130050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:59:02,940-Speed 3266.50 samples/sec Loss 3.8614 LearningRate 0.0227 Epoch: 10 Global Step: 130060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:59:06,086-Speed 3256.01 samples/sec Loss 3.8832 LearningRate 0.0227 Epoch: 10 Global Step: 130070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:59:09,147-Speed 3345.89 samples/sec Loss 3.9446 LearningRate 0.0227 Epoch: 10 Global Step: 130080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:59:12,286-Speed 3264.01 samples/sec Loss 3.8534 LearningRate 0.0227 Epoch: 10 Global Step: 130090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 12:59:15,381-Speed 3309.10 samples/sec Loss 3.8424 LearningRate 0.0227 Epoch: 10 Global Step: 130100 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:18,464-Speed 3322.32 samples/sec Loss 3.9689 LearningRate 0.0227 Epoch: 10 Global Step: 130110 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:21,519-Speed 3353.40 samples/sec Loss 3.7965 LearningRate 0.0227 Epoch: 10 Global Step: 130120 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:24,611-Speed 3313.13 samples/sec Loss 3.8788 LearningRate 0.0227 Epoch: 10 Global Step: 130130 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:27,667-Speed 3351.71 samples/sec Loss 3.8447 LearningRate 0.0227 Epoch: 10 Global Step: 130140 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:30,722-Speed 3353.51 samples/sec Loss 3.8727 LearningRate 0.0227 Epoch: 10 Global Step: 130150 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:33,818-Speed 3308.05 samples/sec Loss 3.8449 LearningRate 0.0227 Epoch: 10 Global Step: 130160 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:36,916-Speed 3305.86 samples/sec Loss 3.9414 LearningRate 0.0227 Epoch: 10 Global Step: 130170 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:39,978-Speed 3346.25 samples/sec Loss 3.8345 LearningRate 0.0227 Epoch: 10 Global Step: 130180 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:43,028-Speed 3358.14 samples/sec Loss 3.7623 LearningRate 0.0227 Epoch: 10 Global Step: 130190 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:46,080-Speed 3356.24 samples/sec Loss 3.9460 LearningRate 0.0226 Epoch: 10 Global Step: 130200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:49,202-Speed 3281.02 samples/sec Loss 3.9261 LearningRate 0.0226 Epoch: 10 Global Step: 130210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:52,274-Speed 3333.96 samples/sec Loss 3.8547 LearningRate 0.0226 Epoch: 10 Global Step: 130220 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:55,326-Speed 3356.58 samples/sec Loss 3.9252 LearningRate 0.0226 Epoch: 10 Global Step: 130230 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 12:59:58,388-Speed 3345.34 samples/sec Loss 3.8817 LearningRate 0.0226 Epoch: 10 Global Step: 130240 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:00:01,483-Speed 3309.39 samples/sec Loss 3.9916 LearningRate 0.0226 Epoch: 10 Global Step: 130250 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:00:04,604-Speed 3281.82 samples/sec Loss 3.9208 LearningRate 0.0226 Epoch: 10 Global Step: 130260 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:00:07,732-Speed 3275.20 samples/sec Loss 3.9424 LearningRate 0.0226 Epoch: 10 Global Step: 130270 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:00:10,777-Speed 3363.47 samples/sec Loss 3.8771 LearningRate 0.0226 Epoch: 10 Global Step: 130280 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:00:13,825-Speed 3361.18 samples/sec Loss 3.8452 LearningRate 0.0226 Epoch: 10 Global Step: 130290 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:00:16,883-Speed 3350.26 samples/sec Loss 3.9542 LearningRate 0.0226 Epoch: 10 Global Step: 130300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:19,952-Speed 3336.67 samples/sec Loss 3.8612 LearningRate 0.0226 Epoch: 10 Global Step: 130310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:23,008-Speed 3352.35 samples/sec Loss 3.7919 LearningRate 0.0226 Epoch: 10 Global Step: 130320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:26,073-Speed 3342.12 samples/sec Loss 3.9889 LearningRate 0.0226 Epoch: 10 Global Step: 130330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:29,225-Speed 3249.63 samples/sec Loss 3.8470 LearningRate 0.0226 Epoch: 10 Global Step: 130340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:32,280-Speed 3353.66 samples/sec Loss 3.8948 LearningRate 0.0226 Epoch: 10 Global Step: 130350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:35,362-Speed 3323.24 samples/sec Loss 3.9275 LearningRate 0.0226 Epoch: 10 Global Step: 130360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:38,494-Speed 3270.32 samples/sec Loss 3.9009 LearningRate 0.0226 Epoch: 10 Global Step: 130370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:41,576-Speed 3323.05 samples/sec Loss 3.9323 LearningRate 0.0226 Epoch: 10 Global Step: 130380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:44,659-Speed 3322.86 samples/sec Loss 3.8382 LearningRate 0.0226 Epoch: 10 Global Step: 130390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:47,779-Speed 3283.25 samples/sec Loss 3.8327 LearningRate 0.0226 Epoch: 10 Global Step: 130400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:00:50,831-Speed 3355.55 samples/sec Loss 3.9490 LearningRate 0.0226 Epoch: 10 Global Step: 130410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:54,012-Speed 3220.57 samples/sec Loss 3.8233 LearningRate 0.0226 Epoch: 10 Global Step: 130420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:00:57,079-Speed 3340.02 samples/sec Loss 3.8600 LearningRate 0.0226 Epoch: 10 Global Step: 130430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:01:00,192-Speed 3290.85 samples/sec Loss 3.9429 LearningRate 0.0226 Epoch: 10 Global Step: 130440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:01:03,255-Speed 3343.59 samples/sec Loss 3.8860 LearningRate 0.0226 Epoch: 10 Global Step: 130450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:01:06,328-Speed 3334.05 samples/sec Loss 3.9096 LearningRate 0.0225 Epoch: 10 Global Step: 130460 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:01:09,414-Speed 3319.18 samples/sec Loss 3.8503 LearningRate 0.0225 Epoch: 10 Global Step: 130470 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:01:12,508-Speed 3310.54 samples/sec Loss 3.7978 LearningRate 0.0225 Epoch: 10 Global Step: 130480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:01:15,578-Speed 3336.00 samples/sec Loss 3.8989 LearningRate 0.0225 Epoch: 10 Global Step: 130490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:01:18,631-Speed 3355.96 samples/sec Loss 3.9028 LearningRate 0.0225 Epoch: 10 Global Step: 130500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:01:21,688-Speed 3350.57 samples/sec Loss 3.8008 LearningRate 0.0225 Epoch: 10 Global Step: 130510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:01:24,791-Speed 3300.54 samples/sec Loss 3.8471 LearningRate 0.0225 Epoch: 10 Global Step: 130520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:01:27,991-Speed 3201.02 samples/sec Loss 3.8659 LearningRate 0.0225 Epoch: 10 Global Step: 130530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:01:31,134-Speed 3259.91 samples/sec Loss 3.9121 LearningRate 0.0225 Epoch: 10 Global Step: 130540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:01:34,185-Speed 3356.93 samples/sec Loss 3.8427 LearningRate 0.0225 Epoch: 10 Global Step: 130550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:01:37,287-Speed 3302.34 samples/sec Loss 3.9076 LearningRate 0.0225 Epoch: 10 Global Step: 130560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:01:40,387-Speed 3303.45 samples/sec Loss 3.9277 LearningRate 0.0225 Epoch: 10 Global Step: 130570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:01:43,460-Speed 3333.93 samples/sec Loss 3.8179 LearningRate 0.0225 Epoch: 10 Global Step: 130580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:01:46,561-Speed 3303.21 samples/sec Loss 3.9246 LearningRate 0.0225 Epoch: 10 Global Step: 130590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:01:49,695-Speed 3268.40 samples/sec Loss 3.8896 LearningRate 0.0225 Epoch: 10 Global Step: 130600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:01:52,815-Speed 3282.89 samples/sec Loss 3.9425 LearningRate 0.0225 Epoch: 10 Global Step: 130610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:01:55,895-Speed 3325.86 samples/sec Loss 3.9774 LearningRate 0.0225 Epoch: 10 Global Step: 130620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:01:58,998-Speed 3300.52 samples/sec Loss 3.8434 LearningRate 0.0225 Epoch: 10 Global Step: 130630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:02:02,079-Speed 3325.68 samples/sec Loss 3.8583 LearningRate 0.0225 Epoch: 10 Global Step: 130640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:02:05,238-Speed 3241.68 samples/sec Loss 3.9429 LearningRate 0.0225 Epoch: 10 Global Step: 130650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:02:08,347-Speed 3295.00 samples/sec Loss 3.8543 LearningRate 0.0225 Epoch: 10 Global Step: 130660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:02:11,398-Speed 3357.66 samples/sec Loss 3.8565 LearningRate 0.0225 Epoch: 10 Global Step: 130670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:02:14,515-Speed 3285.58 samples/sec Loss 3.8654 LearningRate 0.0225 Epoch: 10 Global Step: 130680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:02:17,700-Speed 3216.65 samples/sec Loss 3.7809 LearningRate 0.0225 Epoch: 10 Global Step: 130690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:02:20,821-Speed 3282.27 samples/sec Loss 3.8581 LearningRate 0.0225 Epoch: 10 Global Step: 130700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:02:23,903-Speed 3323.34 samples/sec Loss 3.8421 LearningRate 0.0225 Epoch: 10 Global Step: 130710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:02:27,079-Speed 3225.36 samples/sec Loss 3.8587 LearningRate 0.0224 Epoch: 10 Global Step: 130720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:02:30,136-Speed 3351.07 samples/sec Loss 3.9629 LearningRate 0.0224 Epoch: 10 Global Step: 130730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:02:33,172-Speed 3373.65 samples/sec Loss 3.9536 LearningRate 0.0224 Epoch: 10 Global Step: 130740 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:02:36,273-Speed 3303.43 samples/sec Loss 3.8671 LearningRate 0.0224 Epoch: 10 Global Step: 130750 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:02:39,344-Speed 3335.19 samples/sec Loss 3.8950 LearningRate 0.0224 Epoch: 10 Global Step: 130760 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:02:42,430-Speed 3319.23 samples/sec Loss 3.8367 LearningRate 0.0224 Epoch: 10 Global Step: 130770 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:02:45,483-Speed 3354.73 samples/sec Loss 3.9089 LearningRate 0.0224 Epoch: 10 Global Step: 130780 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:02:48,588-Speed 3299.15 samples/sec Loss 3.9356 LearningRate 0.0224 Epoch: 10 Global Step: 130790 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:02:51,684-Speed 3308.11 samples/sec Loss 3.8305 LearningRate 0.0224 Epoch: 10 Global Step: 130800 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:02:54,787-Speed 3301.19 samples/sec Loss 3.8506 LearningRate 0.0224 Epoch: 10 Global Step: 130810 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:02:57,896-Speed 3294.32 samples/sec Loss 3.9349 LearningRate 0.0224 Epoch: 10 Global Step: 130820 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:03:01,051-Speed 3247.32 samples/sec Loss 3.8770 LearningRate 0.0224 Epoch: 10 Global Step: 130830 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:03:04,147-Speed 3309.25 samples/sec Loss 3.8703 LearningRate 0.0224 Epoch: 10 Global Step: 130840 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:07,276-Speed 3273.09 samples/sec Loss 3.8439 LearningRate 0.0224 Epoch: 10 Global Step: 130850 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:10,340-Speed 3343.31 samples/sec Loss 3.9543 LearningRate 0.0224 Epoch: 10 Global Step: 130860 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:13,490-Speed 3252.09 samples/sec Loss 3.9192 LearningRate 0.0224 Epoch: 10 Global Step: 130870 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:16,637-Speed 3254.70 samples/sec Loss 3.8286 LearningRate 0.0224 Epoch: 10 Global Step: 130880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:19,717-Speed 3325.09 samples/sec Loss 3.9466 LearningRate 0.0224 Epoch: 10 Global Step: 130890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:22,791-Speed 3332.16 samples/sec Loss 4.0185 LearningRate 0.0224 Epoch: 10 Global Step: 130900 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:25,965-Speed 3227.96 samples/sec Loss 3.9388 LearningRate 0.0224 Epoch: 10 Global Step: 130910 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:29,138-Speed 3227.94 samples/sec Loss 3.9412 LearningRate 0.0224 Epoch: 10 Global Step: 130920 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:32,287-Speed 3253.59 samples/sec Loss 3.9272 LearningRate 0.0224 Epoch: 10 Global Step: 130930 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:35,365-Speed 3327.07 samples/sec Loss 3.9177 LearningRate 0.0224 Epoch: 10 Global Step: 130940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:03:38,515-Speed 3251.24 samples/sec Loss 3.8872 LearningRate 0.0224 Epoch: 10 Global Step: 130950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:03:41,712-Speed 3204.96 samples/sec Loss 3.9465 LearningRate 0.0224 Epoch: 10 Global Step: 130960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:03:44,839-Speed 3275.62 samples/sec Loss 3.9229 LearningRate 0.0224 Epoch: 10 Global Step: 130970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:03:47,898-Speed 3348.10 samples/sec Loss 3.8905 LearningRate 0.0223 Epoch: 10 Global Step: 130980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:03:50,963-Speed 3342.20 samples/sec Loss 3.9166 LearningRate 0.0223 Epoch: 10 Global Step: 130990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:54,075-Speed 3291.26 samples/sec Loss 3.8962 LearningRate 0.0223 Epoch: 10 Global Step: 131000 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:03:57,165-Speed 3315.72 samples/sec Loss 3.9039 LearningRate 0.0223 Epoch: 10 Global Step: 131010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:04:00,282-Speed 3285.98 samples/sec Loss 3.8451 LearningRate 0.0223 Epoch: 10 Global Step: 131020 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:04:03,354-Speed 3334.30 samples/sec Loss 3.9710 LearningRate 0.0223 Epoch: 10 Global Step: 131030 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:04:06,454-Speed 3303.96 samples/sec Loss 3.9180 LearningRate 0.0223 Epoch: 10 Global Step: 131040 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:04:09,578-Speed 3278.69 samples/sec Loss 3.8662 LearningRate 0.0223 Epoch: 10 Global Step: 131050 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:04:12,681-Speed 3302.33 samples/sec Loss 3.9038 LearningRate 0.0223 Epoch: 10 Global Step: 131060 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:04:15,778-Speed 3306.65 samples/sec Loss 3.9201 LearningRate 0.0223 Epoch: 10 Global Step: 131070 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:04:18,943-Speed 3236.40 samples/sec Loss 3.9032 LearningRate 0.0223 Epoch: 10 Global Step: 131080 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:04:22,009-Speed 3341.84 samples/sec Loss 3.9790 LearningRate 0.0223 Epoch: 10 Global Step: 131090 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:04:25,136-Speed 3275.11 samples/sec Loss 3.8169 LearningRate 0.0223 Epoch: 10 Global Step: 131100 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:04:28,320-Speed 3217.35 samples/sec Loss 3.9174 LearningRate 0.0223 Epoch: 10 Global Step: 131110 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:04:31,415-Speed 3309.87 samples/sec Loss 3.9084 LearningRate 0.0223 Epoch: 10 Global Step: 131120 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:04:34,551-Speed 3265.33 samples/sec Loss 3.8579 LearningRate 0.0223 Epoch: 10 Global Step: 131130 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:04:37,687-Speed 3267.08 samples/sec Loss 4.0366 LearningRate 0.0223 Epoch: 10 Global Step: 131140 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:04:40,800-Speed 3290.59 samples/sec Loss 3.9194 LearningRate 0.0223 Epoch: 10 Global Step: 131150 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:04:43,918-Speed 3284.46 samples/sec Loss 3.8955 LearningRate 0.0223 Epoch: 10 Global Step: 131160 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:04:47,006-Speed 3317.37 samples/sec Loss 3.8790 LearningRate 0.0223 Epoch: 10 Global Step: 131170 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:04:50,141-Speed 3267.74 samples/sec Loss 3.9627 LearningRate 0.0223 Epoch: 10 Global Step: 131180 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:04:53,354-Speed 3187.72 samples/sec Loss 3.9092 LearningRate 0.0223 Epoch: 10 Global Step: 131190 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:04:56,416-Speed 3345.05 samples/sec Loss 3.9021 LearningRate 0.0223 Epoch: 10 Global Step: 131200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:04:59,512-Speed 3309.29 samples/sec Loss 3.9781 LearningRate 0.0223 Epoch: 10 Global Step: 131210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:05:02,651-Speed 3262.63 samples/sec Loss 3.9205 LearningRate 0.0223 Epoch: 10 Global Step: 131220 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:05:05,803-Speed 3250.03 samples/sec Loss 3.8917 LearningRate 0.0223 Epoch: 10 Global Step: 131230 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:05:08,902-Speed 3305.51 samples/sec Loss 3.9564 LearningRate 0.0223 Epoch: 10 Global Step: 131240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:05:12,026-Speed 3278.40 samples/sec Loss 3.9245 LearningRate 0.0222 Epoch: 10 Global Step: 131250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:05:15,157-Speed 3272.15 samples/sec Loss 3.9141 LearningRate 0.0222 Epoch: 10 Global Step: 131260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:05:18,316-Speed 3242.01 samples/sec Loss 3.8677 LearningRate 0.0222 Epoch: 10 Global Step: 131270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:05:21,395-Speed 3326.25 samples/sec Loss 3.9224 LearningRate 0.0222 Epoch: 10 Global Step: 131280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:05:24,520-Speed 3278.43 samples/sec Loss 3.9477 LearningRate 0.0222 Epoch: 10 Global Step: 131290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:05:27,673-Speed 3248.82 samples/sec Loss 3.8971 LearningRate 0.0222 Epoch: 10 Global Step: 131300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:05:30,788-Speed 3288.14 samples/sec Loss 3.9382 LearningRate 0.0222 Epoch: 10 Global Step: 131310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:05:33,848-Speed 3346.66 samples/sec Loss 3.9162 LearningRate 0.0222 Epoch: 10 Global Step: 131320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:05:36,955-Speed 3297.35 samples/sec Loss 3.9225 LearningRate 0.0222 Epoch: 10 Global Step: 131330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:05:40,135-Speed 3221.44 samples/sec Loss 3.9012 LearningRate 0.0222 Epoch: 10 Global Step: 131340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:05:43,285-Speed 3251.68 samples/sec Loss 3.9100 LearningRate 0.0222 Epoch: 10 Global Step: 131350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:05:46,400-Speed 3288.91 samples/sec Loss 3.8985 LearningRate 0.0222 Epoch: 10 Global Step: 131360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:05:49,473-Speed 3332.63 samples/sec Loss 3.8635 LearningRate 0.0222 Epoch: 10 Global Step: 131370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:05:52,604-Speed 3272.09 samples/sec Loss 3.8692 LearningRate 0.0222 Epoch: 10 Global Step: 131380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:05:55,755-Speed 3250.81 samples/sec Loss 4.0099 LearningRate 0.0222 Epoch: 10 Global Step: 131390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:05:58,833-Speed 3327.42 samples/sec Loss 3.9206 LearningRate 0.0222 Epoch: 10 Global Step: 131400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:01,994-Speed 3240.21 samples/sec Loss 3.9370 LearningRate 0.0222 Epoch: 10 Global Step: 131410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:05,178-Speed 3217.19 samples/sec Loss 3.9601 LearningRate 0.0222 Epoch: 10 Global Step: 131420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:08,313-Speed 3267.47 samples/sec Loss 3.9021 LearningRate 0.0222 Epoch: 10 Global Step: 131430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:11,452-Speed 3263.81 samples/sec Loss 3.9557 LearningRate 0.0222 Epoch: 10 Global Step: 131440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:14,587-Speed 3267.05 samples/sec Loss 3.9359 LearningRate 0.0222 Epoch: 10 Global Step: 131450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:17,686-Speed 3305.26 samples/sec Loss 3.8919 LearningRate 0.0222 Epoch: 10 Global Step: 131460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:20,817-Speed 3270.94 samples/sec Loss 3.8188 LearningRate 0.0222 Epoch: 10 Global Step: 131470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:23,959-Speed 3260.59 samples/sec Loss 3.9520 LearningRate 0.0222 Epoch: 10 Global Step: 131480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:27,066-Speed 3296.78 samples/sec Loss 3.9298 LearningRate 0.0222 Epoch: 10 Global Step: 131490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:06:30,214-Speed 3253.91 samples/sec Loss 3.9694 LearningRate 0.0222 Epoch: 10 Global Step: 131500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:33,307-Speed 3311.45 samples/sec Loss 3.8164 LearningRate 0.0221 Epoch: 10 Global Step: 131510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:36,522-Speed 3186.28 samples/sec Loss 3.9718 LearningRate 0.0221 Epoch: 10 Global Step: 131520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:39,599-Speed 3329.57 samples/sec Loss 3.8892 LearningRate 0.0221 Epoch: 10 Global Step: 131530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:06:42,731-Speed 3270.40 samples/sec Loss 3.8836 LearningRate 0.0221 Epoch: 10 Global Step: 131540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:06:45,783-Speed 3355.67 samples/sec Loss 3.9373 LearningRate 0.0221 Epoch: 10 Global Step: 131550 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:06:48,881-Speed 3306.95 samples/sec Loss 3.9218 LearningRate 0.0221 Epoch: 10 Global Step: 131560 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:06:52,027-Speed 3255.78 samples/sec Loss 3.9019 LearningRate 0.0221 Epoch: 10 Global Step: 131570 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:06:55,217-Speed 3210.84 samples/sec Loss 3.8199 LearningRate 0.0221 Epoch: 10 Global Step: 131580 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:06:58,304-Speed 3318.33 samples/sec Loss 3.9088 LearningRate 0.0221 Epoch: 10 Global Step: 131590 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:07:01,451-Speed 3254.67 samples/sec Loss 3.9665 LearningRate 0.0221 Epoch: 10 Global Step: 131600 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:07:04,586-Speed 3266.97 samples/sec Loss 3.8785 LearningRate 0.0221 Epoch: 10 Global Step: 131610 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:07:07,688-Speed 3302.86 samples/sec Loss 3.9358 LearningRate 0.0221 Epoch: 10 Global Step: 131620 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:07:10,741-Speed 3354.68 samples/sec Loss 3.8629 LearningRate 0.0221 Epoch: 10 Global Step: 131630 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:07:13,904-Speed 3239.01 samples/sec Loss 3.8943 LearningRate 0.0221 Epoch: 10 Global Step: 131640 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:07:17,047-Speed 3258.83 samples/sec Loss 4.0163 LearningRate 0.0221 Epoch: 10 Global Step: 131650 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:07:20,165-Speed 3285.55 samples/sec Loss 3.9947 LearningRate 0.0221 Epoch: 10 Global Step: 131660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:07:23,260-Speed 3309.27 samples/sec Loss 3.9330 LearningRate 0.0221 Epoch: 10 Global Step: 131670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:07:26,379-Speed 3284.33 samples/sec Loss 3.8673 LearningRate 0.0221 Epoch: 10 Global Step: 131680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:07:29,490-Speed 3292.53 samples/sec Loss 3.8917 LearningRate 0.0221 Epoch: 10 Global Step: 131690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:07:32,549-Speed 3349.21 samples/sec Loss 3.9245 LearningRate 0.0221 Epoch: 10 Global Step: 131700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:07:35,714-Speed 3236.36 samples/sec Loss 3.9239 LearningRate 0.0221 Epoch: 10 Global Step: 131710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:07:38,802-Speed 3316.93 samples/sec Loss 3.9167 LearningRate 0.0221 Epoch: 10 Global Step: 131720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:07:41,939-Speed 3265.07 samples/sec Loss 3.8791 LearningRate 0.0221 Epoch: 10 Global Step: 131730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:07:45,025-Speed 3319.23 samples/sec Loss 3.9315 LearningRate 0.0221 Epoch: 10 Global Step: 131740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:07:48,093-Speed 3338.95 samples/sec Loss 3.9835 LearningRate 0.0221 Epoch: 10 Global Step: 131750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:07:51,205-Speed 3291.10 samples/sec Loss 3.9729 LearningRate 0.0221 Epoch: 10 Global Step: 131760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:07:54,354-Speed 3252.84 samples/sec Loss 3.8210 LearningRate 0.0220 Epoch: 10 Global Step: 131770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:07:57,427-Speed 3333.56 samples/sec Loss 3.8237 LearningRate 0.0220 Epoch: 10 Global Step: 131780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:08:00,550-Speed 3279.50 samples/sec Loss 3.8984 LearningRate 0.0220 Epoch: 10 Global Step: 131790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:08:03,641-Speed 3313.71 samples/sec Loss 3.8747 LearningRate 0.0220 Epoch: 10 Global Step: 131800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:08:06,786-Speed 3256.97 samples/sec Loss 3.9058 LearningRate 0.0220 Epoch: 10 Global Step: 131810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:08:09,905-Speed 3285.20 samples/sec Loss 3.8713 LearningRate 0.0220 Epoch: 10 Global Step: 131820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:08:12,998-Speed 3311.16 samples/sec Loss 3.9283 LearningRate 0.0220 Epoch: 10 Global Step: 131830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:08:16,125-Speed 3275.50 samples/sec Loss 3.9170 LearningRate 0.0220 Epoch: 10 Global Step: 131840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:08:19,213-Speed 3317.18 samples/sec Loss 3.9329 LearningRate 0.0220 Epoch: 10 Global Step: 131850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:08:22,302-Speed 3316.52 samples/sec Loss 3.9539 LearningRate 0.0220 Epoch: 10 Global Step: 131860 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:08:25,482-Speed 3221.14 samples/sec Loss 3.9266 LearningRate 0.0220 Epoch: 10 Global Step: 131870 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:08:28,564-Speed 3323.33 samples/sec Loss 4.0532 LearningRate 0.0220 Epoch: 10 Global Step: 131880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:08:31,735-Speed 3229.97 samples/sec Loss 3.9314 LearningRate 0.0220 Epoch: 10 Global Step: 131890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:08:34,838-Speed 3301.23 samples/sec Loss 3.9445 LearningRate 0.0220 Epoch: 10 Global Step: 131900 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:08:37,908-Speed 3336.36 samples/sec Loss 3.9907 LearningRate 0.0220 Epoch: 10 Global Step: 131910 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:08:41,049-Speed 3261.42 samples/sec Loss 3.9588 LearningRate 0.0220 Epoch: 10 Global Step: 131920 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:08:44,135-Speed 3319.14 samples/sec Loss 4.0303 LearningRate 0.0220 Epoch: 10 Global Step: 131930 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:08:47,221-Speed 3319.34 samples/sec Loss 3.9259 LearningRate 0.0220 Epoch: 10 Global Step: 131940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:08:50,350-Speed 3272.87 samples/sec Loss 3.9588 LearningRate 0.0220 Epoch: 10 Global Step: 131950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:08:53,506-Speed 3245.95 samples/sec Loss 3.8982 LearningRate 0.0220 Epoch: 10 Global Step: 131960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:08:56,609-Speed 3301.33 samples/sec Loss 3.9314 LearningRate 0.0220 Epoch: 10 Global Step: 131970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:08:59,684-Speed 3330.38 samples/sec Loss 3.9328 LearningRate 0.0220 Epoch: 10 Global Step: 131980 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:09:02,778-Speed 3311.61 samples/sec Loss 3.9128 LearningRate 0.0220 Epoch: 10 Global Step: 131990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:09:05,957-Speed 3222.10 samples/sec Loss 3.9284 LearningRate 0.0220 Epoch: 10 Global Step: 132000 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:09:09,059-Speed 3301.94 samples/sec Loss 3.9924 LearningRate 0.0220 Epoch: 10 Global Step: 132010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:09:12,164-Speed 3298.81 samples/sec Loss 4.0486 LearningRate 0.0220 Epoch: 10 Global Step: 132020 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:09:15,293-Speed 3273.52 samples/sec Loss 3.9510 LearningRate 0.0220 Epoch: 10 Global Step: 132030 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:09:18,409-Speed 3287.76 samples/sec Loss 3.9481 LearningRate 0.0219 Epoch: 10 Global Step: 132040 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:09:21,500-Speed 3313.96 samples/sec Loss 3.9534 LearningRate 0.0219 Epoch: 10 Global Step: 132050 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:09:24,606-Speed 3297.28 samples/sec Loss 3.9387 LearningRate 0.0219 Epoch: 10 Global Step: 132060 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:09:27,756-Speed 3252.19 samples/sec Loss 4.0776 LearningRate 0.0219 Epoch: 10 Global Step: 132070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:09:30,902-Speed 3255.79 samples/sec Loss 3.9158 LearningRate 0.0219 Epoch: 10 Global Step: 132080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:09:33,962-Speed 3347.40 samples/sec Loss 3.8361 LearningRate 0.0219 Epoch: 10 Global Step: 132090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:09:37,075-Speed 3290.21 samples/sec Loss 3.9421 LearningRate 0.0219 Epoch: 10 Global Step: 132100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:09:40,740-Speed 2794.80 samples/sec Loss 4.1094 LearningRate 0.0219 Epoch: 10 Global Step: 132110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:09:43,845-Speed 3298.45 samples/sec Loss 3.9214 LearningRate 0.0219 Epoch: 10 Global Step: 132120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:09:46,921-Speed 3330.81 samples/sec Loss 4.0333 LearningRate 0.0219 Epoch: 10 Global Step: 132130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:09:50,025-Speed 3300.17 samples/sec Loss 4.0111 LearningRate 0.0219 Epoch: 10 Global Step: 132140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:09:53,159-Speed 3267.79 samples/sec Loss 3.9720 LearningRate 0.0219 Epoch: 10 Global Step: 132150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:09:56,272-Speed 3290.19 samples/sec Loss 3.9460 LearningRate 0.0219 Epoch: 10 Global Step: 132160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:09:59,328-Speed 3352.44 samples/sec Loss 4.0041 LearningRate 0.0219 Epoch: 10 Global Step: 132170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:10:02,470-Speed 3260.12 samples/sec Loss 3.9470 LearningRate 0.0219 Epoch: 10 Global Step: 132180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:10:05,606-Speed 3266.72 samples/sec Loss 3.9984 LearningRate 0.0219 Epoch: 10 Global Step: 132190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:10:08,648-Speed 3367.13 samples/sec Loss 3.9784 LearningRate 0.0219 Epoch: 10 Global Step: 132200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:10:11,758-Speed 3293.78 samples/sec Loss 4.0192 LearningRate 0.0219 Epoch: 10 Global Step: 132210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:10:14,912-Speed 3247.84 samples/sec Loss 4.0240 LearningRate 0.0219 Epoch: 10 Global Step: 132220 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:10:18,055-Speed 3259.34 samples/sec Loss 3.9741 LearningRate 0.0219 Epoch: 10 Global Step: 132230 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:10:21,154-Speed 3305.10 samples/sec Loss 3.8956 LearningRate 0.0219 Epoch: 10 Global Step: 132240 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:10:24,244-Speed 3314.76 samples/sec Loss 4.0062 LearningRate 0.0219 Epoch: 10 Global Step: 132250 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:10:27,384-Speed 3261.90 samples/sec Loss 3.8855 LearningRate 0.0219 Epoch: 10 Global Step: 132260 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:10:30,630-Speed 3156.71 samples/sec Loss 3.8897 LearningRate 0.0219 Epoch: 10 Global Step: 132270 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:10:33,689-Speed 3347.31 samples/sec Loss 3.8394 LearningRate 0.0219 Epoch: 10 Global Step: 132280 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:10:36,779-Speed 3315.32 samples/sec Loss 3.9911 LearningRate 0.0219 Epoch: 10 Global Step: 132290 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:10:39,872-Speed 3312.76 samples/sec Loss 4.0215 LearningRate 0.0218 Epoch: 10 Global Step: 132300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:10:43,006-Speed 3267.98 samples/sec Loss 3.9467 LearningRate 0.0218 Epoch: 10 Global Step: 132310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:10:46,083-Speed 3328.47 samples/sec Loss 3.8645 LearningRate 0.0218 Epoch: 10 Global Step: 132320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:10:49,208-Speed 3278.44 samples/sec Loss 3.9511 LearningRate 0.0218 Epoch: 10 Global Step: 132330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:10:52,319-Speed 3291.97 samples/sec Loss 4.0104 LearningRate 0.0218 Epoch: 10 Global Step: 132340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:10:55,421-Speed 3302.32 samples/sec Loss 3.8903 LearningRate 0.0218 Epoch: 10 Global Step: 132350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:10:58,535-Speed 3290.12 samples/sec Loss 4.0693 LearningRate 0.0218 Epoch: 10 Global Step: 132360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:11:01,644-Speed 3294.72 samples/sec Loss 3.8864 LearningRate 0.0218 Epoch: 10 Global Step: 132370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:11:04,729-Speed 3320.37 samples/sec Loss 3.9689 LearningRate 0.0218 Epoch: 10 Global Step: 132380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:11:07,783-Speed 3353.29 samples/sec Loss 3.9313 LearningRate 0.0218 Epoch: 10 Global Step: 132390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:11:10,856-Speed 3333.48 samples/sec Loss 3.9195 LearningRate 0.0218 Epoch: 10 Global Step: 132400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:11:13,970-Speed 3289.41 samples/sec Loss 3.9141 LearningRate 0.0218 Epoch: 10 Global Step: 132410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:11:17,166-Speed 3204.80 samples/sec Loss 3.9543 LearningRate 0.0218 Epoch: 10 Global Step: 132420 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:11:20,310-Speed 3257.99 samples/sec Loss 3.9896 LearningRate 0.0218 Epoch: 10 Global Step: 132430 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:11:23,414-Speed 3300.57 samples/sec Loss 3.9024 LearningRate 0.0218 Epoch: 10 Global Step: 132440 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:11:26,530-Speed 3287.09 samples/sec Loss 3.9361 LearningRate 0.0218 Epoch: 10 Global Step: 132450 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:11:29,681-Speed 3250.39 samples/sec Loss 3.9298 LearningRate 0.0218 Epoch: 10 Global Step: 132460 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:11:32,745-Speed 3343.33 samples/sec Loss 4.0049 LearningRate 0.0218 Epoch: 10 Global Step: 132470 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:11:35,869-Speed 3278.76 samples/sec Loss 3.9699 LearningRate 0.0218 Epoch: 10 Global Step: 132480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:11:38,939-Speed 3337.29 samples/sec Loss 3.9145 LearningRate 0.0218 Epoch: 10 Global Step: 132490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:11:42,032-Speed 3311.91 samples/sec Loss 4.0061 LearningRate 0.0218 Epoch: 10 Global Step: 132500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:11:45,112-Speed 3325.58 samples/sec Loss 3.9130 LearningRate 0.0218 Epoch: 10 Global Step: 132510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:11:48,216-Speed 3300.59 samples/sec Loss 4.0498 LearningRate 0.0218 Epoch: 10 Global Step: 132520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:11:51,284-Speed 3337.96 samples/sec Loss 3.7764 LearningRate 0.0218 Epoch: 10 Global Step: 132530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:11:54,384-Speed 3303.94 samples/sec Loss 3.8534 LearningRate 0.0218 Epoch: 10 Global Step: 132540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:11:57,459-Speed 3331.75 samples/sec Loss 3.8954 LearningRate 0.0218 Epoch: 10 Global Step: 132550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:12:00,538-Speed 3326.71 samples/sec Loss 4.0054 LearningRate 0.0218 Epoch: 10 Global Step: 132560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:12:03,592-Speed 3354.24 samples/sec Loss 3.8980 LearningRate 0.0217 Epoch: 10 Global Step: 132570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:12:06,730-Speed 3263.51 samples/sec Loss 3.9937 LearningRate 0.0217 Epoch: 10 Global Step: 132580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:12:09,789-Speed 3349.94 samples/sec Loss 3.9339 LearningRate 0.0217 Epoch: 10 Global Step: 132590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:12:12,840-Speed 3356.84 samples/sec Loss 3.9608 LearningRate 0.0217 Epoch: 10 Global Step: 132600 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:12:15,929-Speed 3316.56 samples/sec Loss 3.9791 LearningRate 0.0217 Epoch: 10 Global Step: 132610 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:12:19,019-Speed 3314.32 samples/sec Loss 4.0300 LearningRate 0.0217 Epoch: 10 Global Step: 132620 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:12:22,091-Speed 3335.14 samples/sec Loss 4.0523 LearningRate 0.0217 Epoch: 10 Global Step: 132630 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:12:25,154-Speed 3344.09 samples/sec Loss 3.9488 LearningRate 0.0217 Epoch: 10 Global Step: 132640 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:12:28,351-Speed 3204.64 samples/sec Loss 3.9708 LearningRate 0.0217 Epoch: 10 Global Step: 132650 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:12:31,448-Speed 3307.14 samples/sec Loss 3.9306 LearningRate 0.0217 Epoch: 10 Global Step: 132660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:12:34,528-Speed 3326.29 samples/sec Loss 4.0244 LearningRate 0.0217 Epoch: 10 Global Step: 132670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:12:37,658-Speed 3272.79 samples/sec Loss 3.9568 LearningRate 0.0217 Epoch: 10 Global Step: 132680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:12:40,735-Speed 3329.30 samples/sec Loss 3.9304 LearningRate 0.0217 Epoch: 10 Global Step: 132690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:12:43,917-Speed 3218.27 samples/sec Loss 3.8680 LearningRate 0.0217 Epoch: 10 Global Step: 132700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:12:47,013-Speed 3309.26 samples/sec Loss 3.9638 LearningRate 0.0217 Epoch: 10 Global Step: 132710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:12:50,091-Speed 3328.07 samples/sec Loss 3.9044 LearningRate 0.0217 Epoch: 10 Global Step: 132720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:12:53,270-Speed 3221.51 samples/sec Loss 4.0229 LearningRate 0.0217 Epoch: 10 Global Step: 132730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:12:56,373-Speed 3300.67 samples/sec Loss 3.9435 LearningRate 0.0217 Epoch: 10 Global Step: 132740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:12:59,495-Speed 3281.57 samples/sec Loss 3.8142 LearningRate 0.0217 Epoch: 10 Global Step: 132750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:13:02,688-Speed 3208.59 samples/sec Loss 3.9713 LearningRate 0.0217 Epoch: 10 Global Step: 132760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:13:05,839-Speed 3250.65 samples/sec Loss 3.9598 LearningRate 0.0217 Epoch: 10 Global Step: 132770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:13:08,917-Speed 3327.83 samples/sec Loss 3.9601 LearningRate 0.0217 Epoch: 10 Global Step: 132780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:13:11,979-Speed 3345.07 samples/sec Loss 3.9911 LearningRate 0.0217 Epoch: 10 Global Step: 132790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:13:15,159-Speed 3221.05 samples/sec Loss 3.8462 LearningRate 0.0217 Epoch: 10 Global Step: 132800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:13:18,292-Speed 3269.45 samples/sec Loss 3.9966 LearningRate 0.0217 Epoch: 10 Global Step: 132810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:13:21,362-Speed 3337.18 samples/sec Loss 3.9283 LearningRate 0.0217 Epoch: 10 Global Step: 132820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:13:24,471-Speed 3294.46 samples/sec Loss 3.9526 LearningRate 0.0217 Epoch: 10 Global Step: 132830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:13:27,573-Speed 3302.52 samples/sec Loss 3.9732 LearningRate 0.0216 Epoch: 10 Global Step: 132840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:13:30,650-Speed 3328.95 samples/sec Loss 3.8485 LearningRate 0.0216 Epoch: 10 Global Step: 132850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:13:33,706-Speed 3352.00 samples/sec Loss 3.8922 LearningRate 0.0216 Epoch: 10 Global Step: 132860 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:13:36,797-Speed 3313.24 samples/sec Loss 3.9679 LearningRate 0.0216 Epoch: 10 Global Step: 132870 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:13:39,860-Speed 3344.17 samples/sec Loss 4.0400 LearningRate 0.0216 Epoch: 10 Global Step: 132880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:13:42,961-Speed 3304.00 samples/sec Loss 3.9652 LearningRate 0.0216 Epoch: 10 Global Step: 132890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:13:46,030-Speed 3337.37 samples/sec Loss 3.9879 LearningRate 0.0216 Epoch: 10 Global Step: 132900 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:13:49,133-Speed 3301.71 samples/sec Loss 3.9515 LearningRate 0.0216 Epoch: 10 Global Step: 132910 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:13:52,278-Speed 3256.28 samples/sec Loss 3.9437 LearningRate 0.0216 Epoch: 10 Global Step: 132920 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:13:55,377-Speed 3306.30 samples/sec Loss 3.9211 LearningRate 0.0216 Epoch: 10 Global Step: 132930 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:13:58,487-Speed 3292.89 samples/sec Loss 3.9381 LearningRate 0.0216 Epoch: 10 Global Step: 132940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:01,621-Speed 3268.84 samples/sec Loss 3.8976 LearningRate 0.0216 Epoch: 10 Global Step: 132950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:04,767-Speed 3255.54 samples/sec Loss 4.0478 LearningRate 0.0216 Epoch: 10 Global Step: 132960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:14:07,886-Speed 3284.90 samples/sec Loss 3.9369 LearningRate 0.0216 Epoch: 10 Global Step: 132970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:14:10,958-Speed 3333.72 samples/sec Loss 3.9602 LearningRate 0.0216 Epoch: 10 Global Step: 132980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:14:14,035-Speed 3329.77 samples/sec Loss 3.9557 LearningRate 0.0216 Epoch: 10 Global Step: 132990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:14:17,107-Speed 3334.30 samples/sec Loss 3.9374 LearningRate 0.0216 Epoch: 10 Global Step: 133000 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:20,182-Speed 3331.60 samples/sec Loss 3.9740 LearningRate 0.0216 Epoch: 10 Global Step: 133010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:23,241-Speed 3347.94 samples/sec Loss 3.9086 LearningRate 0.0216 Epoch: 10 Global Step: 133020 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:26,411-Speed 3231.97 samples/sec Loss 4.0360 LearningRate 0.0216 Epoch: 10 Global Step: 133030 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:29,545-Speed 3267.58 samples/sec Loss 3.9396 LearningRate 0.0216 Epoch: 10 Global Step: 133040 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:32,688-Speed 3258.73 samples/sec Loss 4.0255 LearningRate 0.0216 Epoch: 10 Global Step: 133050 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:35,775-Speed 3318.63 samples/sec Loss 3.8993 LearningRate 0.0216 Epoch: 10 Global Step: 133060 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:38,874-Speed 3305.40 samples/sec Loss 3.9860 LearningRate 0.0216 Epoch: 10 Global Step: 133070 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:42,060-Speed 3214.81 samples/sec Loss 3.9297 LearningRate 0.0216 Epoch: 10 Global Step: 133080 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:45,136-Speed 3330.82 samples/sec Loss 3.8860 LearningRate 0.0216 Epoch: 10 Global Step: 133090 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:14:48,234-Speed 3306.53 samples/sec Loss 3.9923 LearningRate 0.0215 Epoch: 10 Global Step: 133100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:14:51,324-Speed 3314.21 samples/sec Loss 3.9671 LearningRate 0.0215 Epoch: 10 Global Step: 133110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:14:54,421-Speed 3307.91 samples/sec Loss 3.9223 LearningRate 0.0215 Epoch: 10 Global Step: 133120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:14:57,495-Speed 3332.07 samples/sec Loss 3.9459 LearningRate 0.0215 Epoch: 10 Global Step: 133130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:00,680-Speed 3216.84 samples/sec Loss 4.0038 LearningRate 0.0215 Epoch: 10 Global Step: 133140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:03,851-Speed 3229.54 samples/sec Loss 3.9614 LearningRate 0.0215 Epoch: 10 Global Step: 133150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:07,026-Speed 3226.14 samples/sec Loss 3.9362 LearningRate 0.0215 Epoch: 10 Global Step: 133160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:10,089-Speed 3344.74 samples/sec Loss 3.8813 LearningRate 0.0215 Epoch: 10 Global Step: 133170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:13,258-Speed 3232.76 samples/sec Loss 3.8742 LearningRate 0.0215 Epoch: 10 Global Step: 133180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:16,385-Speed 3275.69 samples/sec Loss 3.9699 LearningRate 0.0215 Epoch: 10 Global Step: 133190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:19,537-Speed 3249.58 samples/sec Loss 4.0010 LearningRate 0.0215 Epoch: 10 Global Step: 133200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:15:22,627-Speed 3315.03 samples/sec Loss 4.0766 LearningRate 0.0215 Epoch: 10 Global Step: 133210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:15:25,743-Speed 3287.02 samples/sec Loss 4.0021 LearningRate 0.0215 Epoch: 10 Global Step: 133220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:15:28,857-Speed 3289.11 samples/sec Loss 3.8942 LearningRate 0.0215 Epoch: 10 Global Step: 133230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:15:31,909-Speed 3356.54 samples/sec Loss 3.9526 LearningRate 0.0215 Epoch: 10 Global Step: 133240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:35,026-Speed 3286.69 samples/sec Loss 3.9529 LearningRate 0.0215 Epoch: 10 Global Step: 133250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:38,157-Speed 3271.83 samples/sec Loss 3.9256 LearningRate 0.0215 Epoch: 10 Global Step: 133260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:41,347-Speed 3209.84 samples/sec Loss 3.9118 LearningRate 0.0215 Epoch: 10 Global Step: 133270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:44,463-Speed 3287.59 samples/sec Loss 3.9095 LearningRate 0.0215 Epoch: 10 Global Step: 133280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:47,573-Speed 3293.28 samples/sec Loss 3.8632 LearningRate 0.0215 Epoch: 10 Global Step: 133290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:50,711-Speed 3265.13 samples/sec Loss 4.0037 LearningRate 0.0215 Epoch: 10 Global Step: 133300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:53,863-Speed 3248.81 samples/sec Loss 3.9731 LearningRate 0.0215 Epoch: 10 Global Step: 133310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:56,942-Speed 3327.51 samples/sec Loss 3.9083 LearningRate 0.0215 Epoch: 10 Global Step: 133320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:15:59,999-Speed 3350.25 samples/sec Loss 3.9935 LearningRate 0.0215 Epoch: 10 Global Step: 133330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:03,114-Speed 3288.14 samples/sec Loss 3.9463 LearningRate 0.0215 Epoch: 10 Global Step: 133340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:16:06,215-Speed 3303.28 samples/sec Loss 3.9622 LearningRate 0.0215 Epoch: 10 Global Step: 133350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:16:09,308-Speed 3312.50 samples/sec Loss 3.9767 LearningRate 0.0215 Epoch: 10 Global Step: 133360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:16:12,390-Speed 3323.48 samples/sec Loss 3.9433 LearningRate 0.0214 Epoch: 10 Global Step: 133370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:15,610-Speed 3181.62 samples/sec Loss 4.0103 LearningRate 0.0214 Epoch: 10 Global Step: 133380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:18,696-Speed 3318.46 samples/sec Loss 3.9849 LearningRate 0.0214 Epoch: 10 Global Step: 133390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:21,784-Speed 3318.07 samples/sec Loss 3.9160 LearningRate 0.0214 Epoch: 10 Global Step: 133400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:24,957-Speed 3227.88 samples/sec Loss 3.9226 LearningRate 0.0214 Epoch: 10 Global Step: 133410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:28,024-Speed 3340.33 samples/sec Loss 4.0708 LearningRate 0.0214 Epoch: 10 Global Step: 133420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:31,131-Speed 3296.25 samples/sec Loss 3.9544 LearningRate 0.0214 Epoch: 10 Global Step: 133430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:34,203-Speed 3334.19 samples/sec Loss 3.9048 LearningRate 0.0214 Epoch: 10 Global Step: 133440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:37,266-Speed 3344.33 samples/sec Loss 3.9133 LearningRate 0.0214 Epoch: 10 Global Step: 133450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:40,404-Speed 3264.47 samples/sec Loss 3.9101 LearningRate 0.0214 Epoch: 10 Global Step: 133460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:43,530-Speed 3276.88 samples/sec Loss 3.9982 LearningRate 0.0214 Epoch: 10 Global Step: 133470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:16:46,597-Speed 3340.16 samples/sec Loss 3.8768 LearningRate 0.0214 Epoch: 10 Global Step: 133480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:49,676-Speed 3326.66 samples/sec Loss 3.9924 LearningRate 0.0214 Epoch: 10 Global Step: 133490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:52,765-Speed 3316.16 samples/sec Loss 3.9413 LearningRate 0.0214 Epoch: 10 Global Step: 133500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:55,836-Speed 3336.05 samples/sec Loss 3.9591 LearningRate 0.0214 Epoch: 10 Global Step: 133510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:16:58,948-Speed 3291.50 samples/sec Loss 3.9592 LearningRate 0.0214 Epoch: 10 Global Step: 133520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:02,069-Speed 3282.59 samples/sec Loss 3.9998 LearningRate 0.0214 Epoch: 10 Global Step: 133530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:05,178-Speed 3294.35 samples/sec Loss 3.9210 LearningRate 0.0214 Epoch: 10 Global Step: 133540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:08,255-Speed 3329.27 samples/sec Loss 3.9377 LearningRate 0.0214 Epoch: 10 Global Step: 133550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:11,390-Speed 3267.41 samples/sec Loss 3.9449 LearningRate 0.0214 Epoch: 10 Global Step: 133560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:14,486-Speed 3307.95 samples/sec Loss 4.0773 LearningRate 0.0214 Epoch: 10 Global Step: 133570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:17,638-Speed 3249.26 samples/sec Loss 3.9691 LearningRate 0.0214 Epoch: 10 Global Step: 133580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:17:20,719-Speed 3325.86 samples/sec Loss 3.9494 LearningRate 0.0214 Epoch: 10 Global Step: 133590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:17:23,818-Speed 3304.11 samples/sec Loss 3.8680 LearningRate 0.0214 Epoch: 10 Global Step: 133600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:17:26,967-Speed 3253.23 samples/sec Loss 3.9684 LearningRate 0.0214 Epoch: 10 Global Step: 133610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:30,056-Speed 3316.35 samples/sec Loss 3.9727 LearningRate 0.0214 Epoch: 10 Global Step: 133620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:33,186-Speed 3272.00 samples/sec Loss 3.9423 LearningRate 0.0214 Epoch: 10 Global Step: 133630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:36,345-Speed 3243.01 samples/sec Loss 3.8937 LearningRate 0.0213 Epoch: 10 Global Step: 133640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:39,470-Speed 3278.62 samples/sec Loss 3.9673 LearningRate 0.0213 Epoch: 10 Global Step: 133650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:42,607-Speed 3264.75 samples/sec Loss 3.9556 LearningRate 0.0213 Epoch: 10 Global Step: 133660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:45,681-Speed 3332.58 samples/sec Loss 4.0136 LearningRate 0.0213 Epoch: 10 Global Step: 133670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:48,822-Speed 3261.05 samples/sec Loss 3.9534 LearningRate 0.0213 Epoch: 10 Global Step: 133680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:52,010-Speed 3212.88 samples/sec Loss 3.9823 LearningRate 0.0213 Epoch: 10 Global Step: 133690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:17:55,078-Speed 3338.74 samples/sec Loss 3.9280 LearningRate 0.0213 Epoch: 10 Global Step: 133700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:17:58,167-Speed 3315.94 samples/sec Loss 3.9651 LearningRate 0.0213 Epoch: 10 Global Step: 133710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:01,256-Speed 3316.04 samples/sec Loss 3.9620 LearningRate 0.0213 Epoch: 10 Global Step: 133720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:04,397-Speed 3261.20 samples/sec Loss 4.0191 LearningRate 0.0213 Epoch: 10 Global Step: 133730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:07,506-Speed 3294.98 samples/sec Loss 4.0584 LearningRate 0.0213 Epoch: 10 Global Step: 133740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:10,598-Speed 3312.77 samples/sec Loss 3.9009 LearningRate 0.0213 Epoch: 10 Global Step: 133750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:13,723-Speed 3277.58 samples/sec Loss 3.9151 LearningRate 0.0213 Epoch: 10 Global Step: 133760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:16,873-Speed 3252.24 samples/sec Loss 3.8977 LearningRate 0.0213 Epoch: 10 Global Step: 133770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:19,993-Speed 3282.82 samples/sec Loss 3.9971 LearningRate 0.0213 Epoch: 10 Global Step: 133780 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:23,105-Speed 3291.18 samples/sec Loss 3.9402 LearningRate 0.0213 Epoch: 10 Global Step: 133790 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:26,248-Speed 3260.09 samples/sec Loss 3.9577 LearningRate 0.0213 Epoch: 10 Global Step: 133800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:18:29,338-Speed 3314.10 samples/sec Loss 3.9763 LearningRate 0.0213 Epoch: 10 Global Step: 133810 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:32,452-Speed 3289.84 samples/sec Loss 3.9263 LearningRate 0.0213 Epoch: 10 Global Step: 133820 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:35,581-Speed 3273.33 samples/sec Loss 3.9295 LearningRate 0.0213 Epoch: 10 Global Step: 133830 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:38,686-Speed 3299.97 samples/sec Loss 4.0056 LearningRate 0.0213 Epoch: 10 Global Step: 133840 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:41,789-Speed 3300.94 samples/sec Loss 3.9471 LearningRate 0.0213 Epoch: 10 Global Step: 133850 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:44,878-Speed 3315.59 samples/sec Loss 3.9510 LearningRate 0.0213 Epoch: 10 Global Step: 133860 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:48,021-Speed 3258.82 samples/sec Loss 3.9233 LearningRate 0.0213 Epoch: 10 Global Step: 133870 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:51,082-Speed 3346.90 samples/sec Loss 3.8985 LearningRate 0.0213 Epoch: 10 Global Step: 133880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:54,193-Speed 3292.47 samples/sec Loss 3.9902 LearningRate 0.0213 Epoch: 10 Global Step: 133890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:18:57,283-Speed 3315.03 samples/sec Loss 3.9437 LearningRate 0.0213 Epoch: 10 Global Step: 133900 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:19:00,366-Speed 3322.13 samples/sec Loss 3.9844 LearningRate 0.0212 Epoch: 10 Global Step: 133910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:03,426-Speed 3347.72 samples/sec Loss 3.8624 LearningRate 0.0212 Epoch: 10 Global Step: 133920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:06,544-Speed 3284.90 samples/sec Loss 3.9850 LearningRate 0.0212 Epoch: 10 Global Step: 133930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:09,615-Speed 3335.25 samples/sec Loss 4.0063 LearningRate 0.0212 Epoch: 10 Global Step: 133940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:12,741-Speed 3276.84 samples/sec Loss 3.9886 LearningRate 0.0212 Epoch: 10 Global Step: 133950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:15,914-Speed 3227.89 samples/sec Loss 4.0137 LearningRate 0.0212 Epoch: 10 Global Step: 133960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:19,012-Speed 3307.11 samples/sec Loss 3.9108 LearningRate 0.0212 Epoch: 10 Global Step: 133970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:22,090-Speed 3327.71 samples/sec Loss 4.0072 LearningRate 0.0212 Epoch: 10 Global Step: 133980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:25,274-Speed 3216.98 samples/sec Loss 3.9927 LearningRate 0.0212 Epoch: 10 Global Step: 133990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:28,394-Speed 3282.70 samples/sec Loss 3.9779 LearningRate 0.0212 Epoch: 10 Global Step: 134000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:31,511-Speed 3286.51 samples/sec Loss 3.7833 LearningRate 0.0212 Epoch: 10 Global Step: 134010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:19:34,607-Speed 3307.73 samples/sec Loss 4.0132 LearningRate 0.0212 Epoch: 10 Global Step: 134020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:37,706-Speed 3306.14 samples/sec Loss 4.0233 LearningRate 0.0212 Epoch: 10 Global Step: 134030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:40,790-Speed 3320.92 samples/sec Loss 3.9276 LearningRate 0.0212 Epoch: 10 Global Step: 134040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:43,910-Speed 3283.05 samples/sec Loss 3.9821 LearningRate 0.0212 Epoch: 10 Global Step: 134050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:46,983-Speed 3333.85 samples/sec Loss 4.0354 LearningRate 0.0212 Epoch: 10 Global Step: 134060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:50,143-Speed 3240.89 samples/sec Loss 4.0067 LearningRate 0.0212 Epoch: 10 Global Step: 134070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:53,263-Speed 3283.76 samples/sec Loss 3.9062 LearningRate 0.0212 Epoch: 10 Global Step: 134080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:56,328-Speed 3342.42 samples/sec Loss 3.9312 LearningRate 0.0212 Epoch: 10 Global Step: 134090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:19:59,428-Speed 3303.46 samples/sec Loss 3.9461 LearningRate 0.0212 Epoch: 10 Global Step: 134100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:02,607-Speed 3222.72 samples/sec Loss 3.9333 LearningRate 0.0212 Epoch: 10 Global Step: 134110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:05,730-Speed 3279.68 samples/sec Loss 4.0031 LearningRate 0.0212 Epoch: 10 Global Step: 134120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:20:08,831-Speed 3302.79 samples/sec Loss 3.9112 LearningRate 0.0212 Epoch: 10 Global Step: 134130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:11,945-Speed 3289.77 samples/sec Loss 3.9343 LearningRate 0.0212 Epoch: 10 Global Step: 134140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:15,038-Speed 3311.46 samples/sec Loss 3.9983 LearningRate 0.0212 Epoch: 10 Global Step: 134150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:18,119-Speed 3324.93 samples/sec Loss 3.9305 LearningRate 0.0212 Epoch: 10 Global Step: 134160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:21,198-Speed 3326.61 samples/sec Loss 3.9262 LearningRate 0.0212 Epoch: 10 Global Step: 134170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:24,313-Speed 3289.14 samples/sec Loss 3.9731 LearningRate 0.0211 Epoch: 10 Global Step: 134180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:27,371-Speed 3349.18 samples/sec Loss 3.9739 LearningRate 0.0211 Epoch: 10 Global Step: 134190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:30,463-Speed 3312.82 samples/sec Loss 3.9712 LearningRate 0.0211 Epoch: 10 Global Step: 134200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:33,574-Speed 3293.13 samples/sec Loss 3.9490 LearningRate 0.0211 Epoch: 10 Global Step: 134210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:36,722-Speed 3253.53 samples/sec Loss 4.0251 LearningRate 0.0211 Epoch: 10 Global Step: 134220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:39,823-Speed 3303.80 samples/sec Loss 3.9591 LearningRate 0.0211 Epoch: 10 Global Step: 134230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:42,938-Speed 3287.83 samples/sec Loss 4.0375 LearningRate 0.0211 Epoch: 10 Global Step: 134240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:46,003-Speed 3342.21 samples/sec Loss 3.9310 LearningRate 0.0211 Epoch: 10 Global Step: 134250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:49,125-Speed 3281.05 samples/sec Loss 3.9520 LearningRate 0.0211 Epoch: 10 Global Step: 134260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:52,262-Speed 3265.76 samples/sec Loss 3.9203 LearningRate 0.0211 Epoch: 10 Global Step: 134270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:55,381-Speed 3283.54 samples/sec Loss 4.0094 LearningRate 0.0211 Epoch: 10 Global Step: 134280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:20:58,550-Speed 3232.41 samples/sec Loss 3.9599 LearningRate 0.0211 Epoch: 10 Global Step: 134290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:01,617-Speed 3340.02 samples/sec Loss 3.9180 LearningRate 0.0211 Epoch: 10 Global Step: 134300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:04,698-Speed 3324.26 samples/sec Loss 4.0095 LearningRate 0.0211 Epoch: 10 Global Step: 134310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:07,792-Speed 3310.68 samples/sec Loss 4.0007 LearningRate 0.0211 Epoch: 10 Global Step: 134320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:10,851-Speed 3347.96 samples/sec Loss 3.9911 LearningRate 0.0211 Epoch: 10 Global Step: 134330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:21:13,969-Speed 3285.25 samples/sec Loss 3.9306 LearningRate 0.0211 Epoch: 10 Global Step: 134340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:21:17,091-Speed 3281.62 samples/sec Loss 4.0027 LearningRate 0.0211 Epoch: 10 Global Step: 134350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:21:20,148-Speed 3350.91 samples/sec Loss 4.0257 LearningRate 0.0211 Epoch: 10 Global Step: 134360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:23,199-Speed 3356.40 samples/sec Loss 3.9548 LearningRate 0.0211 Epoch: 10 Global Step: 134370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:26,296-Speed 3307.87 samples/sec Loss 3.9576 LearningRate 0.0211 Epoch: 10 Global Step: 134380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:29,352-Speed 3352.09 samples/sec Loss 4.0388 LearningRate 0.0211 Epoch: 10 Global Step: 134390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:32,480-Speed 3274.73 samples/sec Loss 4.0127 LearningRate 0.0211 Epoch: 10 Global Step: 134400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:35,563-Speed 3322.50 samples/sec Loss 3.8915 LearningRate 0.0211 Epoch: 10 Global Step: 134410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:38,661-Speed 3306.77 samples/sec Loss 4.0138 LearningRate 0.0211 Epoch: 10 Global Step: 134420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:41,724-Speed 3344.20 samples/sec Loss 3.9751 LearningRate 0.0211 Epoch: 10 Global Step: 134430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:44,785-Speed 3346.26 samples/sec Loss 3.9392 LearningRate 0.0211 Epoch: 10 Global Step: 134440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:47,865-Speed 3326.17 samples/sec Loss 3.9534 LearningRate 0.0210 Epoch: 10 Global Step: 134450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:51,002-Speed 3264.74 samples/sec Loss 3.9382 LearningRate 0.0210 Epoch: 10 Global Step: 134460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:21:54,079-Speed 3329.01 samples/sec Loss 4.0542 LearningRate 0.0210 Epoch: 10 Global Step: 134470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:21:57,162-Speed 3322.45 samples/sec Loss 4.0418 LearningRate 0.0210 Epoch: 10 Global Step: 134480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:22:00,272-Speed 3293.35 samples/sec Loss 3.9565 LearningRate 0.0210 Epoch: 10 Global Step: 134490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:22:03,350-Speed 3328.57 samples/sec Loss 3.9544 LearningRate 0.0210 Epoch: 10 Global Step: 134500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:06,444-Speed 3310.25 samples/sec Loss 4.0159 LearningRate 0.0210 Epoch: 10 Global Step: 134510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:09,548-Speed 3299.77 samples/sec Loss 3.9490 LearningRate 0.0210 Epoch: 10 Global Step: 134520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:12,629-Speed 3324.53 samples/sec Loss 3.8298 LearningRate 0.0210 Epoch: 10 Global Step: 134530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:15,713-Speed 3321.88 samples/sec Loss 4.0321 LearningRate 0.0210 Epoch: 10 Global Step: 134540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:18,813-Speed 3304.79 samples/sec Loss 3.9113 LearningRate 0.0210 Epoch: 10 Global Step: 134550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:21,881-Speed 3337.68 samples/sec Loss 4.0059 LearningRate 0.0210 Epoch: 10 Global Step: 134560 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:25,049-Speed 3234.29 samples/sec Loss 3.8779 LearningRate 0.0210 Epoch: 10 Global Step: 134570 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:28,181-Speed 3270.45 samples/sec Loss 3.9127 LearningRate 0.0210 Epoch: 10 Global Step: 134580 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:31,278-Speed 3307.14 samples/sec Loss 3.9095 LearningRate 0.0210 Epoch: 10 Global Step: 134590 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:34,348-Speed 3337.17 samples/sec Loss 3.9766 LearningRate 0.0210 Epoch: 10 Global Step: 134600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:22:37,465-Speed 3285.97 samples/sec Loss 3.9349 LearningRate 0.0210 Epoch: 10 Global Step: 134610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:22:40,538-Speed 3333.57 samples/sec Loss 3.8612 LearningRate 0.0210 Epoch: 10 Global Step: 134620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:22:43,613-Speed 3331.00 samples/sec Loss 4.0168 LearningRate 0.0210 Epoch: 10 Global Step: 134630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:22:46,661-Speed 3360.39 samples/sec Loss 4.0111 LearningRate 0.0210 Epoch: 10 Global Step: 134640 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:49,749-Speed 3317.74 samples/sec Loss 3.9699 LearningRate 0.0210 Epoch: 10 Global Step: 134650 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:52,819-Speed 3336.64 samples/sec Loss 3.8739 LearningRate 0.0210 Epoch: 10 Global Step: 134660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:55,945-Speed 3276.84 samples/sec Loss 3.9688 LearningRate 0.0210 Epoch: 10 Global Step: 134670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:22:59,086-Speed 3260.44 samples/sec Loss 3.9354 LearningRate 0.0210 Epoch: 10 Global Step: 134680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:02,262-Speed 3225.49 samples/sec Loss 3.8810 LearningRate 0.0210 Epoch: 10 Global Step: 134690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:05,330-Speed 3339.59 samples/sec Loss 3.9620 LearningRate 0.0210 Epoch: 10 Global Step: 134700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:08,403-Speed 3333.50 samples/sec Loss 4.0166 LearningRate 0.0210 Epoch: 10 Global Step: 134710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:11,468-Speed 3341.35 samples/sec Loss 3.8964 LearningRate 0.0209 Epoch: 10 Global Step: 134720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:14,563-Speed 3310.34 samples/sec Loss 3.9736 LearningRate 0.0209 Epoch: 10 Global Step: 134730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:17,639-Speed 3329.88 samples/sec Loss 4.0166 LearningRate 0.0209 Epoch: 10 Global Step: 134740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:23:20,734-Speed 3309.59 samples/sec Loss 4.0165 LearningRate 0.0209 Epoch: 10 Global Step: 134750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:23,904-Speed 3231.34 samples/sec Loss 3.9068 LearningRate 0.0209 Epoch: 10 Global Step: 134760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:26,992-Speed 3317.54 samples/sec Loss 3.8689 LearningRate 0.0209 Epoch: 10 Global Step: 134770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:30,141-Speed 3252.07 samples/sec Loss 3.9189 LearningRate 0.0209 Epoch: 10 Global Step: 134780 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:33,242-Speed 3303.36 samples/sec Loss 3.9920 LearningRate 0.0209 Epoch: 10 Global Step: 134790 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:36,401-Speed 3243.19 samples/sec Loss 3.8850 LearningRate 0.0209 Epoch: 10 Global Step: 134800 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:39,502-Speed 3303.16 samples/sec Loss 3.9940 LearningRate 0.0209 Epoch: 10 Global Step: 134810 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:42,573-Speed 3335.35 samples/sec Loss 4.0781 LearningRate 0.0209 Epoch: 10 Global Step: 134820 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:45,681-Speed 3296.33 samples/sec Loss 3.9370 LearningRate 0.0209 Epoch: 10 Global Step: 134830 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:48,891-Speed 3190.54 samples/sec Loss 4.0117 LearningRate 0.0209 Epoch: 10 Global Step: 134840 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:23:51,976-Speed 3320.64 samples/sec Loss 4.0412 LearningRate 0.0209 Epoch: 10 Global Step: 134850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:23:55,046-Speed 3336.55 samples/sec Loss 3.9320 LearningRate 0.0209 Epoch: 10 Global Step: 134860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:23:58,149-Speed 3300.47 samples/sec Loss 4.0075 LearningRate 0.0209 Epoch: 10 Global Step: 134870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:24:01,304-Speed 3246.48 samples/sec Loss 4.0153 LearningRate 0.0209 Epoch: 10 Global Step: 134880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:24:04,405-Speed 3303.18 samples/sec Loss 3.9724 LearningRate 0.0209 Epoch: 10 Global Step: 134890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:24:07,495-Speed 3315.15 samples/sec Loss 4.0104 LearningRate 0.0209 Epoch: 10 Global Step: 134900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:24:10,550-Speed 3353.06 samples/sec Loss 3.9212 LearningRate 0.0209 Epoch: 10 Global Step: 134910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:24:13,598-Speed 3360.57 samples/sec Loss 3.9225 LearningRate 0.0209 Epoch: 10 Global Step: 134920 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:24:16,653-Speed 3352.51 samples/sec Loss 3.9146 LearningRate 0.0209 Epoch: 10 Global Step: 134930 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:24:19,749-Speed 3309.60 samples/sec Loss 3.9888 LearningRate 0.0209 Epoch: 10 Global Step: 134940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:24:22,808-Speed 3348.34 samples/sec Loss 4.0363 LearningRate 0.0209 Epoch: 10 Global Step: 134950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:24:25,944-Speed 3265.95 samples/sec Loss 3.9460 LearningRate 0.0209 Epoch: 10 Global Step: 134960 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:24:29,106-Speed 3239.70 samples/sec Loss 3.9400 LearningRate 0.0209 Epoch: 10 Global Step: 134970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:24:32,250-Speed 3257.28 samples/sec Loss 3.8884 LearningRate 0.0209 Epoch: 10 Global Step: 134980 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:24:35,401-Speed 3251.28 samples/sec Loss 4.0189 LearningRate 0.0208 Epoch: 10 Global Step: 134990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:24:38,534-Speed 3269.70 samples/sec Loss 3.9752 LearningRate 0.0208 Epoch: 10 Global Step: 135000 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:24:41,605-Speed 3335.11 samples/sec Loss 3.9229 LearningRate 0.0208 Epoch: 10 Global Step: 135010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:24:44,720-Speed 3288.46 samples/sec Loss 3.9586 LearningRate 0.0208 Epoch: 10 Global Step: 135020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:24:47,823-Speed 3300.60 samples/sec Loss 4.1096 LearningRate 0.0208 Epoch: 10 Global Step: 135030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:24:51,066-Speed 3158.60 samples/sec Loss 3.8992 LearningRate 0.0208 Epoch: 10 Global Step: 135040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:24:54,198-Speed 3271.17 samples/sec Loss 3.9777 LearningRate 0.0208 Epoch: 10 Global Step: 135050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:24:57,272-Speed 3332.52 samples/sec Loss 3.9533 LearningRate 0.0208 Epoch: 10 Global Step: 135060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:25:00,357-Speed 3319.38 samples/sec Loss 4.0161 LearningRate 0.0208 Epoch: 10 Global Step: 135070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:25:03,463-Speed 3298.62 samples/sec Loss 3.9166 LearningRate 0.0208 Epoch: 10 Global Step: 135080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:25:06,623-Speed 3241.02 samples/sec Loss 3.9677 LearningRate 0.0208 Epoch: 10 Global Step: 135090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:25:09,698-Speed 3330.89 samples/sec Loss 3.9900 LearningRate 0.0208 Epoch: 10 Global Step: 135100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:25:12,772-Speed 3332.64 samples/sec Loss 3.9649 LearningRate 0.0208 Epoch: 10 Global Step: 135110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:25:15,917-Speed 3257.38 samples/sec Loss 3.9225 LearningRate 0.0208 Epoch: 10 Global Step: 135120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:25:19,033-Speed 3287.12 samples/sec Loss 4.0385 LearningRate 0.0208 Epoch: 10 Global Step: 135130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:25:22,084-Speed 3357.48 samples/sec Loss 3.9286 LearningRate 0.0208 Epoch: 10 Global Step: 135140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:25:25,153-Speed 3338.00 samples/sec Loss 3.9312 LearningRate 0.0208 Epoch: 10 Global Step: 135150 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:25:28,247-Speed 3310.66 samples/sec Loss 4.0509 LearningRate 0.0208 Epoch: 10 Global Step: 135160 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:25:31,350-Speed 3300.96 samples/sec Loss 3.9882 LearningRate 0.0208 Epoch: 10 Global Step: 135170 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:25:34,413-Speed 3344.59 samples/sec Loss 4.0618 LearningRate 0.0208 Epoch: 10 Global Step: 135180 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:25:37,550-Speed 3264.73 samples/sec Loss 4.0294 LearningRate 0.0208 Epoch: 10 Global Step: 135190 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:25:40,692-Speed 3260.60 samples/sec Loss 3.9854 LearningRate 0.0208 Epoch: 10 Global Step: 135200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:25:43,796-Speed 3299.98 samples/sec Loss 3.9419 LearningRate 0.0208 Epoch: 10 Global Step: 135210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:25:46,862-Speed 3341.08 samples/sec Loss 3.9864 LearningRate 0.0208 Epoch: 10 Global Step: 135220 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:25:49,957-Speed 3309.53 samples/sec Loss 3.9231 LearningRate 0.0208 Epoch: 10 Global Step: 135230 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:25:53,123-Speed 3235.60 samples/sec Loss 4.0368 LearningRate 0.0208 Epoch: 10 Global Step: 135240 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:25:56,221-Speed 3306.03 samples/sec Loss 3.9484 LearningRate 0.0208 Epoch: 10 Global Step: 135250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:25:59,312-Speed 3313.82 samples/sec Loss 4.0013 LearningRate 0.0207 Epoch: 10 Global Step: 135260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:26:02,371-Speed 3349.19 samples/sec Loss 3.9268 LearningRate 0.0207 Epoch: 10 Global Step: 135270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:26:05,485-Speed 3288.82 samples/sec Loss 3.9313 LearningRate 0.0207 Epoch: 10 Global Step: 135280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:26:08,566-Speed 3324.58 samples/sec Loss 4.0616 LearningRate 0.0207 Epoch: 10 Global Step: 135290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:26:11,688-Speed 3281.71 samples/sec Loss 3.9092 LearningRate 0.0207 Epoch: 10 Global Step: 135300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:26:14,843-Speed 3246.06 samples/sec Loss 4.0187 LearningRate 0.0207 Epoch: 10 Global Step: 135310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:26:18,001-Speed 3243.96 samples/sec Loss 3.9220 LearningRate 0.0207 Epoch: 10 Global Step: 135320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:26:21,108-Speed 3296.67 samples/sec Loss 3.9113 LearningRate 0.0207 Epoch: 10 Global Step: 135330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:26:24,302-Speed 3207.59 samples/sec Loss 3.8548 LearningRate 0.0207 Epoch: 10 Global Step: 135340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:26:27,457-Speed 3246.26 samples/sec Loss 4.0349 LearningRate 0.0207 Epoch: 10 Global Step: 135350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:26:30,559-Speed 3302.06 samples/sec Loss 3.9589 LearningRate 0.0207 Epoch: 10 Global Step: 135360 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:26:33,610-Speed 3357.11 samples/sec Loss 3.9819 LearningRate 0.0207 Epoch: 10 Global Step: 135370 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:26:36,742-Speed 3270.54 samples/sec Loss 4.0139 LearningRate 0.0207 Epoch: 10 Global Step: 135380 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:26:39,877-Speed 3266.72 samples/sec Loss 4.0098 LearningRate 0.0207 Epoch: 10 Global Step: 135390 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:26:43,078-Speed 3200.55 samples/sec Loss 3.9635 LearningRate 0.0207 Epoch: 10 Global Step: 135400 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:26:46,145-Speed 3339.96 samples/sec Loss 3.9867 LearningRate 0.0207 Epoch: 10 Global Step: 135410 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:26:49,223-Speed 3327.27 samples/sec Loss 3.8951 LearningRate 0.0207 Epoch: 10 Global Step: 135420 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:26:52,369-Speed 3257.02 samples/sec Loss 3.9836 LearningRate 0.0207 Epoch: 10 Global Step: 135430 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:26:55,517-Speed 3253.48 samples/sec Loss 3.9374 LearningRate 0.0207 Epoch: 10 Global Step: 135440 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:26:58,574-Speed 3350.47 samples/sec Loss 4.0885 LearningRate 0.0207 Epoch: 10 Global Step: 135450 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:27:01,732-Speed 3243.88 samples/sec Loss 4.0067 LearningRate 0.0207 Epoch: 10 Global Step: 135460 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:27:04,857-Speed 3277.87 samples/sec Loss 3.9490 LearningRate 0.0207 Epoch: 10 Global Step: 135470 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:27:07,938-Speed 3324.52 samples/sec Loss 3.9959 LearningRate 0.0207 Epoch: 10 Global Step: 135480 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:27:11,043-Speed 3299.58 samples/sec Loss 3.9488 LearningRate 0.0207 Epoch: 10 Global Step: 135490 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:27:14,183-Speed 3262.12 samples/sec Loss 3.9580 LearningRate 0.0207 Epoch: 10 Global Step: 135500 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:27:17,290-Speed 3297.12 samples/sec Loss 4.0224 LearningRate 0.0207 Epoch: 10 Global Step: 135510 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:27:20,399-Speed 3294.51 samples/sec Loss 3.9308 LearningRate 0.0207 Epoch: 10 Global Step: 135520 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:27:23,486-Speed 3317.50 samples/sec Loss 3.9563 LearningRate 0.0207 Epoch: 10 Global Step: 135530 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:27:26,585-Speed 3305.85 samples/sec Loss 4.0073 LearningRate 0.0206 Epoch: 10 Global Step: 135540 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-27 13:27:29,758-Speed 3228.04 samples/sec Loss 3.9599 LearningRate 0.0206 Epoch: 10 Global Step: 135550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:27:32,856-Speed 3305.92 samples/sec Loss 3.9198 LearningRate 0.0206 Epoch: 10 Global Step: 135560 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:27:36,009-Speed 3249.80 samples/sec Loss 3.9845 LearningRate 0.0206 Epoch: 10 Global Step: 135570 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:27:39,137-Speed 3274.31 samples/sec Loss 3.9712 LearningRate 0.0206 Epoch: 10 Global Step: 135580 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:27:42,251-Speed 3288.71 samples/sec Loss 3.9161 LearningRate 0.0206 Epoch: 10 Global Step: 135590 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:27:45,312-Speed 3347.15 samples/sec Loss 3.9282 LearningRate 0.0206 Epoch: 10 Global Step: 135600 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:27:48,400-Speed 3316.52 samples/sec Loss 4.0045 LearningRate 0.0206 Epoch: 10 Global Step: 135610 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:27:51,486-Speed 3319.62 samples/sec Loss 3.9485 LearningRate 0.0206 Epoch: 10 Global Step: 135620 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:27:54,609-Speed 3279.70 samples/sec Loss 3.9589 LearningRate 0.0206 Epoch: 10 Global Step: 135630 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:27:57,699-Speed 3315.43 samples/sec Loss 3.8920 LearningRate 0.0206 Epoch: 10 Global Step: 135640 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:00,814-Speed 3288.57 samples/sec Loss 4.0623 LearningRate 0.0206 Epoch: 10 Global Step: 135650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:28:03,953-Speed 3263.20 samples/sec Loss 3.9589 LearningRate 0.0206 Epoch: 10 Global Step: 135660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:28:07,054-Speed 3303.38 samples/sec Loss 4.0234 LearningRate 0.0206 Epoch: 10 Global Step: 135670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:28:10,107-Speed 3354.61 samples/sec Loss 3.9749 LearningRate 0.0206 Epoch: 10 Global Step: 135680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:13,224-Speed 3286.48 samples/sec Loss 3.9864 LearningRate 0.0206 Epoch: 10 Global Step: 135690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:16,330-Speed 3298.19 samples/sec Loss 4.0038 LearningRate 0.0206 Epoch: 10 Global Step: 135700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:19,436-Speed 3297.31 samples/sec Loss 3.9614 LearningRate 0.0206 Epoch: 10 Global Step: 135710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:22,501-Speed 3342.88 samples/sec Loss 3.9696 LearningRate 0.0206 Epoch: 10 Global Step: 135720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:25,596-Speed 3309.13 samples/sec Loss 3.9722 LearningRate 0.0206 Epoch: 10 Global Step: 135730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:28,666-Speed 3336.82 samples/sec Loss 3.9482 LearningRate 0.0206 Epoch: 10 Global Step: 135740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:31,785-Speed 3284.08 samples/sec Loss 3.9780 LearningRate 0.0206 Epoch: 10 Global Step: 135750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:34,900-Speed 3289.02 samples/sec Loss 3.9825 LearningRate 0.0206 Epoch: 10 Global Step: 135760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:37,996-Speed 3308.36 samples/sec Loss 3.9032 LearningRate 0.0206 Epoch: 10 Global Step: 135770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:41,130-Speed 3268.58 samples/sec Loss 4.0090 LearningRate 0.0206 Epoch: 10 Global Step: 135780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:28:44,193-Speed 3344.23 samples/sec Loss 3.9506 LearningRate 0.0206 Epoch: 10 Global Step: 135790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:28:47,290-Speed 3308.29 samples/sec Loss 3.8767 LearningRate 0.0206 Epoch: 10 Global Step: 135800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:28:50,447-Speed 3243.67 samples/sec Loss 3.9930 LearningRate 0.0205 Epoch: 10 Global Step: 135810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:28:53,605-Speed 3243.71 samples/sec Loss 4.0113 LearningRate 0.0205 Epoch: 10 Global Step: 135820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:28:56,706-Speed 3303.68 samples/sec Loss 3.9787 LearningRate 0.0205 Epoch: 10 Global Step: 135830 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:28:59,857-Speed 3250.41 samples/sec Loss 3.9461 LearningRate 0.0205 Epoch: 10 Global Step: 135840 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:02,920-Speed 3344.72 samples/sec Loss 3.9603 LearningRate 0.0205 Epoch: 10 Global Step: 135850 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:06,055-Speed 3267.25 samples/sec Loss 3.9919 LearningRate 0.0205 Epoch: 10 Global Step: 135860 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:09,111-Speed 3352.11 samples/sec Loss 3.9679 LearningRate 0.0205 Epoch: 10 Global Step: 135870 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:12,245-Speed 3268.48 samples/sec Loss 3.9727 LearningRate 0.0205 Epoch: 10 Global Step: 135880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:15,321-Speed 3329.94 samples/sec Loss 3.9672 LearningRate 0.0205 Epoch: 10 Global Step: 135890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:18,421-Speed 3304.49 samples/sec Loss 4.0392 LearningRate 0.0205 Epoch: 10 Global Step: 135900 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:21,496-Speed 3330.93 samples/sec Loss 4.0050 LearningRate 0.0205 Epoch: 10 Global Step: 135910 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:24,561-Speed 3342.04 samples/sec Loss 3.9334 LearningRate 0.0205 Epoch: 10 Global Step: 135920 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:27,651-Speed 3314.72 samples/sec Loss 4.0333 LearningRate 0.0205 Epoch: 10 Global Step: 135930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:29:30,806-Speed 3246.60 samples/sec Loss 3.9763 LearningRate 0.0205 Epoch: 10 Global Step: 135940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:29:33,880-Speed 3332.28 samples/sec Loss 3.9655 LearningRate 0.0205 Epoch: 10 Global Step: 135950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:29:36,963-Speed 3322.90 samples/sec Loss 3.9582 LearningRate 0.0205 Epoch: 10 Global Step: 135960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:29:40,041-Speed 3327.75 samples/sec Loss 3.9963 LearningRate 0.0205 Epoch: 10 Global Step: 135970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:43,136-Speed 3308.82 samples/sec Loss 4.0569 LearningRate 0.0205 Epoch: 10 Global Step: 135980 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:46,208-Speed 3335.32 samples/sec Loss 4.0936 LearningRate 0.0205 Epoch: 10 Global Step: 135990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:49,291-Speed 3322.28 samples/sec Loss 3.9941 LearningRate 0.0205 Epoch: 10 Global Step: 136000 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:52,351-Speed 3346.83 samples/sec Loss 3.8986 LearningRate 0.0205 Epoch: 10 Global Step: 136010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:55,436-Speed 3320.36 samples/sec Loss 3.9642 LearningRate 0.0205 Epoch: 10 Global Step: 136020 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:29:58,532-Speed 3308.92 samples/sec Loss 4.0128 LearningRate 0.0205 Epoch: 10 Global Step: 136030 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:30:01,604-Speed 3333.74 samples/sec Loss 3.9750 LearningRate 0.0205 Epoch: 10 Global Step: 136040 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:30:04,748-Speed 3258.67 samples/sec Loss 4.0492 LearningRate 0.0205 Epoch: 10 Global Step: 136050 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:30:07,883-Speed 3267.37 samples/sec Loss 4.0182 LearningRate 0.0205 Epoch: 10 Global Step: 136060 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:30:10,945-Speed 3345.23 samples/sec Loss 3.9413 LearningRate 0.0205 Epoch: 10 Global Step: 136070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:14,085-Speed 3262.63 samples/sec Loss 3.9702 LearningRate 0.0205 Epoch: 10 Global Step: 136080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:17,171-Speed 3318.59 samples/sec Loss 4.0049 LearningRate 0.0204 Epoch: 10 Global Step: 136090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:20,221-Speed 3359.02 samples/sec Loss 3.9159 LearningRate 0.0204 Epoch: 10 Global Step: 136100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:23,345-Speed 3278.86 samples/sec Loss 3.9804 LearningRate 0.0204 Epoch: 10 Global Step: 136110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:26,522-Speed 3223.56 samples/sec Loss 3.9008 LearningRate 0.0204 Epoch: 10 Global Step: 136120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:29,639-Speed 3286.57 samples/sec Loss 3.9306 LearningRate 0.0204 Epoch: 10 Global Step: 136130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:32,752-Speed 3290.20 samples/sec Loss 3.9972 LearningRate 0.0204 Epoch: 10 Global Step: 136140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:35,825-Speed 3333.84 samples/sec Loss 3.9932 LearningRate 0.0204 Epoch: 10 Global Step: 136150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:38,961-Speed 3266.02 samples/sec Loss 3.9800 LearningRate 0.0204 Epoch: 10 Global Step: 136160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:42,050-Speed 3316.73 samples/sec Loss 3.9386 LearningRate 0.0204 Epoch: 10 Global Step: 136170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:30:45,139-Speed 3315.41 samples/sec Loss 3.9335 LearningRate 0.0204 Epoch: 10 Global Step: 136180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:48,276-Speed 3265.40 samples/sec Loss 3.9313 LearningRate 0.0204 Epoch: 10 Global Step: 136190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:51,356-Speed 3325.57 samples/sec Loss 3.9336 LearningRate 0.0204 Epoch: 10 Global Step: 136200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:30:54,469-Speed 3291.45 samples/sec Loss 4.0556 LearningRate 0.0204 Epoch: 10 Global Step: 136210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:30:57,522-Speed 3354.37 samples/sec Loss 3.8654 LearningRate 0.0204 Epoch: 10 Global Step: 136220 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:00,618-Speed 3308.83 samples/sec Loss 3.9089 LearningRate 0.0204 Epoch: 10 Global Step: 136230 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:03,763-Speed 3256.93 samples/sec Loss 4.0174 LearningRate 0.0204 Epoch: 10 Global Step: 136240 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:06,934-Speed 3230.80 samples/sec Loss 4.0088 LearningRate 0.0204 Epoch: 10 Global Step: 136250 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:10,063-Speed 3273.34 samples/sec Loss 3.9088 LearningRate 0.0204 Epoch: 10 Global Step: 136260 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:13,195-Speed 3270.76 samples/sec Loss 3.9551 LearningRate 0.0204 Epoch: 10 Global Step: 136270 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:16,287-Speed 3313.29 samples/sec Loss 3.8926 LearningRate 0.0204 Epoch: 10 Global Step: 136280 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:19,408-Speed 3282.00 samples/sec Loss 3.9390 LearningRate 0.0204 Epoch: 10 Global Step: 136290 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:22,512-Speed 3299.91 samples/sec Loss 3.9563 LearningRate 0.0204 Epoch: 10 Global Step: 136300 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:25,626-Speed 3288.86 samples/sec Loss 3.9436 LearningRate 0.0204 Epoch: 10 Global Step: 136310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:31:28,810-Speed 3216.84 samples/sec Loss 4.1478 LearningRate 0.0204 Epoch: 10 Global Step: 136320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:31:31,911-Speed 3304.15 samples/sec Loss 3.9684 LearningRate 0.0204 Epoch: 10 Global Step: 136330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:31:35,027-Speed 3287.34 samples/sec Loss 4.0181 LearningRate 0.0204 Epoch: 10 Global Step: 136340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:31:38,125-Speed 3306.01 samples/sec Loss 3.9329 LearningRate 0.0204 Epoch: 10 Global Step: 136350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:31:41,337-Speed 3189.40 samples/sec Loss 3.9696 LearningRate 0.0203 Epoch: 10 Global Step: 136360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:31:44,435-Speed 3306.63 samples/sec Loss 3.9795 LearningRate 0.0203 Epoch: 10 Global Step: 136370 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:47,541-Speed 3298.47 samples/sec Loss 4.0089 LearningRate 0.0203 Epoch: 10 Global Step: 136380 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:50,713-Speed 3229.46 samples/sec Loss 3.9158 LearningRate 0.0203 Epoch: 10 Global Step: 136390 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:53,789-Speed 3329.18 samples/sec Loss 3.9221 LearningRate 0.0203 Epoch: 10 Global Step: 136400 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:56,869-Speed 3326.55 samples/sec Loss 4.0560 LearningRate 0.0203 Epoch: 10 Global Step: 136410 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:31:59,943-Speed 3332.68 samples/sec Loss 3.9597 LearningRate 0.0203 Epoch: 10 Global Step: 136420 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:03,069-Speed 3276.64 samples/sec Loss 4.0186 LearningRate 0.0203 Epoch: 10 Global Step: 136430 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:06,142-Speed 3333.30 samples/sec Loss 3.9450 LearningRate 0.0203 Epoch: 10 Global Step: 136440 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:09,214-Speed 3333.89 samples/sec Loss 3.9855 LearningRate 0.0203 Epoch: 10 Global Step: 136450 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:12,306-Speed 3312.82 samples/sec Loss 3.9591 LearningRate 0.0203 Epoch: 10 Global Step: 136460 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:15,494-Speed 3213.04 samples/sec Loss 3.9361 LearningRate 0.0203 Epoch: 10 Global Step: 136470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:32:18,637-Speed 3259.03 samples/sec Loss 4.0047 LearningRate 0.0203 Epoch: 10 Global Step: 136480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:21,704-Speed 3339.83 samples/sec Loss 4.0243 LearningRate 0.0203 Epoch: 10 Global Step: 136490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:24,821-Speed 3286.25 samples/sec Loss 3.9781 LearningRate 0.0203 Epoch: 10 Global Step: 136500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:28,016-Speed 3206.01 samples/sec Loss 3.9328 LearningRate 0.0203 Epoch: 10 Global Step: 136510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:31,197-Speed 3219.89 samples/sec Loss 3.9677 LearningRate 0.0203 Epoch: 10 Global Step: 136520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:34,260-Speed 3344.54 samples/sec Loss 4.0075 LearningRate 0.0203 Epoch: 10 Global Step: 136530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:37,362-Speed 3302.14 samples/sec Loss 4.0433 LearningRate 0.0203 Epoch: 10 Global Step: 136540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:40,423-Speed 3345.88 samples/sec Loss 3.8995 LearningRate 0.0203 Epoch: 10 Global Step: 136550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:43,513-Speed 3314.81 samples/sec Loss 4.0017 LearningRate 0.0203 Epoch: 10 Global Step: 136560 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:46,592-Speed 3327.39 samples/sec Loss 3.9826 LearningRate 0.0203 Epoch: 10 Global Step: 136570 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:32:49,702-Speed 3293.39 samples/sec Loss 3.9798 LearningRate 0.0203 Epoch: 10 Global Step: 136580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:32:52,816-Speed 3289.77 samples/sec Loss 3.9627 LearningRate 0.0203 Epoch: 10 Global Step: 136590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:32:55,893-Speed 3329.43 samples/sec Loss 4.0171 LearningRate 0.0203 Epoch: 10 Global Step: 136600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:32:59,007-Speed 3288.48 samples/sec Loss 4.0126 LearningRate 0.0203 Epoch: 10 Global Step: 136610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:33:02,344-Speed 3069.69 samples/sec Loss 3.9283 LearningRate 0.0203 Epoch: 10 Global Step: 136620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:33:05,531-Speed 3214.22 samples/sec Loss 3.9324 LearningRate 0.0203 Epoch: 10 Global Step: 136630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:33:37,707-Speed 318.27 samples/sec Loss 2.9401 LearningRate 0.0202 Epoch: 11 Global Step: 136640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:33:41,185-Speed 2945.04 samples/sec Loss 2.8611 LearningRate 0.0202 Epoch: 11 Global Step: 136650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:33:44,269-Speed 3322.10 samples/sec Loss 2.8331 LearningRate 0.0202 Epoch: 11 Global Step: 136660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:33:47,320-Speed 3356.78 samples/sec Loss 2.7897 LearningRate 0.0202 Epoch: 11 Global Step: 136670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:33:50,365-Speed 3363.53 samples/sec Loss 2.7682 LearningRate 0.0202 Epoch: 11 Global Step: 136680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:33:53,543-Speed 3223.48 samples/sec Loss 2.8845 LearningRate 0.0202 Epoch: 11 Global Step: 136690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:33:56,666-Speed 3280.16 samples/sec Loss 2.9041 LearningRate 0.0202 Epoch: 11 Global Step: 136700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:33:59,856-Speed 3210.77 samples/sec Loss 2.9079 LearningRate 0.0202 Epoch: 11 Global Step: 136710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:03,007-Speed 3251.23 samples/sec Loss 2.9523 LearningRate 0.0202 Epoch: 11 Global Step: 136720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:06,106-Speed 3305.22 samples/sec Loss 2.9998 LearningRate 0.0202 Epoch: 11 Global Step: 136730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:09,160-Speed 3353.80 samples/sec Loss 2.8862 LearningRate 0.0202 Epoch: 11 Global Step: 136740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:12,287-Speed 3276.70 samples/sec Loss 2.8751 LearningRate 0.0202 Epoch: 11 Global Step: 136750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:15,452-Speed 3235.69 samples/sec Loss 2.9094 LearningRate 0.0202 Epoch: 11 Global Step: 136760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:18,607-Speed 3247.46 samples/sec Loss 2.8508 LearningRate 0.0202 Epoch: 11 Global Step: 136770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:21,671-Speed 3343.16 samples/sec Loss 2.8919 LearningRate 0.0202 Epoch: 11 Global Step: 136780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:34:24,798-Speed 3275.68 samples/sec Loss 2.8054 LearningRate 0.0202 Epoch: 11 Global Step: 136790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:27,985-Speed 3213.83 samples/sec Loss 2.8865 LearningRate 0.0202 Epoch: 11 Global Step: 136800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:31,246-Speed 3141.07 samples/sec Loss 2.9381 LearningRate 0.0202 Epoch: 11 Global Step: 136810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:34,417-Speed 3229.72 samples/sec Loss 2.8432 LearningRate 0.0202 Epoch: 11 Global Step: 136820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:37,911-Speed 2932.22 samples/sec Loss 2.9254 LearningRate 0.0202 Epoch: 11 Global Step: 136830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:41,074-Speed 3238.48 samples/sec Loss 2.9179 LearningRate 0.0202 Epoch: 11 Global Step: 136840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:44,196-Speed 3280.38 samples/sec Loss 2.8516 LearningRate 0.0202 Epoch: 11 Global Step: 136850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:47,298-Speed 3303.05 samples/sec Loss 2.8545 LearningRate 0.0202 Epoch: 11 Global Step: 136860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:50,395-Speed 3307.43 samples/sec Loss 2.9388 LearningRate 0.0202 Epoch: 11 Global Step: 136870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:34:53,663-Speed 3134.42 samples/sec Loss 2.9119 LearningRate 0.0202 Epoch: 11 Global Step: 136880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:34:56,739-Speed 3329.98 samples/sec Loss 2.8979 LearningRate 0.0202 Epoch: 11 Global Step: 136890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:34:59,810-Speed 3335.53 samples/sec Loss 2.9367 LearningRate 0.0202 Epoch: 11 Global Step: 136900 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:02,872-Speed 3345.14 samples/sec Loss 2.8235 LearningRate 0.0201 Epoch: 11 Global Step: 136910 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:06,007-Speed 3267.69 samples/sec Loss 2.9321 LearningRate 0.0201 Epoch: 11 Global Step: 136920 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:09,117-Speed 3292.71 samples/sec Loss 2.9390 LearningRate 0.0201 Epoch: 11 Global Step: 136930 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:12,204-Speed 3318.23 samples/sec Loss 2.9867 LearningRate 0.0201 Epoch: 11 Global Step: 136940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:15,314-Speed 3293.82 samples/sec Loss 3.0147 LearningRate 0.0201 Epoch: 11 Global Step: 136950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:18,414-Speed 3304.50 samples/sec Loss 3.0088 LearningRate 0.0201 Epoch: 11 Global Step: 136960 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:21,529-Speed 3288.26 samples/sec Loss 2.9474 LearningRate 0.0201 Epoch: 11 Global Step: 136970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:24,633-Speed 3300.73 samples/sec Loss 2.9411 LearningRate 0.0201 Epoch: 11 Global Step: 136980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:35:27,764-Speed 3271.11 samples/sec Loss 2.9918 LearningRate 0.0201 Epoch: 11 Global Step: 136990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:35:30,839-Speed 3331.36 samples/sec Loss 2.9020 LearningRate 0.0201 Epoch: 11 Global Step: 137000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:35:33,906-Speed 3339.10 samples/sec Loss 2.8617 LearningRate 0.0201 Epoch: 11 Global Step: 137010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:35:37,011-Speed 3299.06 samples/sec Loss 2.8938 LearningRate 0.0201 Epoch: 11 Global Step: 137020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:35:40,110-Speed 3306.02 samples/sec Loss 3.0542 LearningRate 0.0201 Epoch: 11 Global Step: 137030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:35:43,266-Speed 3245.71 samples/sec Loss 2.9961 LearningRate 0.0201 Epoch: 11 Global Step: 137040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:35:46,334-Speed 3338.64 samples/sec Loss 2.9213 LearningRate 0.0201 Epoch: 11 Global Step: 137050 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:49,384-Speed 3357.48 samples/sec Loss 2.8990 LearningRate 0.0201 Epoch: 11 Global Step: 137060 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:52,462-Speed 3329.20 samples/sec Loss 2.9681 LearningRate 0.0201 Epoch: 11 Global Step: 137070 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:55,535-Speed 3333.13 samples/sec Loss 2.9344 LearningRate 0.0201 Epoch: 11 Global Step: 137080 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:35:58,591-Speed 3351.76 samples/sec Loss 2.9083 LearningRate 0.0201 Epoch: 11 Global Step: 137090 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:01,666-Speed 3330.56 samples/sec Loss 2.9118 LearningRate 0.0201 Epoch: 11 Global Step: 137100 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:04,752-Speed 3319.89 samples/sec Loss 2.9780 LearningRate 0.0201 Epoch: 11 Global Step: 137110 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:07,810-Speed 3349.47 samples/sec Loss 2.9103 LearningRate 0.0201 Epoch: 11 Global Step: 137120 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:10,892-Speed 3323.16 samples/sec Loss 3.0268 LearningRate 0.0201 Epoch: 11 Global Step: 137130 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:14,102-Speed 3191.12 samples/sec Loss 2.9934 LearningRate 0.0201 Epoch: 11 Global Step: 137140 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:17,333-Speed 3170.06 samples/sec Loss 2.9284 LearningRate 0.0201 Epoch: 11 Global Step: 137150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:36:20,438-Speed 3299.63 samples/sec Loss 2.9272 LearningRate 0.0201 Epoch: 11 Global Step: 137160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:36:23,568-Speed 3272.73 samples/sec Loss 2.9824 LearningRate 0.0201 Epoch: 11 Global Step: 137170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:36:26,698-Speed 3272.42 samples/sec Loss 2.9569 LearningRate 0.0201 Epoch: 11 Global Step: 137180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:36:29,786-Speed 3317.10 samples/sec Loss 2.9560 LearningRate 0.0200 Epoch: 11 Global Step: 137190 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:32,838-Speed 3355.87 samples/sec Loss 2.9762 LearningRate 0.0200 Epoch: 11 Global Step: 137200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:35,930-Speed 3313.26 samples/sec Loss 2.9393 LearningRate 0.0200 Epoch: 11 Global Step: 137210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:39,052-Speed 3281.40 samples/sec Loss 2.9911 LearningRate 0.0200 Epoch: 11 Global Step: 137220 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:42,130-Speed 3327.95 samples/sec Loss 3.0028 LearningRate 0.0200 Epoch: 11 Global Step: 137230 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:45,183-Speed 3354.19 samples/sec Loss 3.0195 LearningRate 0.0200 Epoch: 11 Global Step: 137240 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:48,317-Speed 3269.45 samples/sec Loss 3.0512 LearningRate 0.0200 Epoch: 11 Global Step: 137250 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:51,409-Speed 3312.49 samples/sec Loss 2.9080 LearningRate 0.0200 Epoch: 11 Global Step: 137260 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:54,464-Speed 3352.61 samples/sec Loss 2.9355 LearningRate 0.0200 Epoch: 11 Global Step: 137270 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:36:57,517-Speed 3354.98 samples/sec Loss 2.9216 LearningRate 0.0200 Epoch: 11 Global Step: 137280 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:37:00,654-Speed 3265.23 samples/sec Loss 2.9101 LearningRate 0.0200 Epoch: 11 Global Step: 137290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:03,733-Speed 3327.52 samples/sec Loss 2.9536 LearningRate 0.0200 Epoch: 11 Global Step: 137300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:06,814-Speed 3324.63 samples/sec Loss 2.9760 LearningRate 0.0200 Epoch: 11 Global Step: 137310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:09,880-Speed 3341.42 samples/sec Loss 3.0037 LearningRate 0.0200 Epoch: 11 Global Step: 137320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:12,992-Speed 3290.93 samples/sec Loss 2.9794 LearningRate 0.0200 Epoch: 11 Global Step: 137330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:16,121-Speed 3273.31 samples/sec Loss 3.0347 LearningRate 0.0200 Epoch: 11 Global Step: 137340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:19,214-Speed 3312.34 samples/sec Loss 3.0643 LearningRate 0.0200 Epoch: 11 Global Step: 137350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:22,301-Speed 3317.39 samples/sec Loss 2.9784 LearningRate 0.0200 Epoch: 11 Global Step: 137360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:25,382-Speed 3325.54 samples/sec Loss 2.9640 LearningRate 0.0200 Epoch: 11 Global Step: 137370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:28,481-Speed 3305.17 samples/sec Loss 3.0190 LearningRate 0.0200 Epoch: 11 Global Step: 137380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:31,680-Speed 3201.65 samples/sec Loss 2.9705 LearningRate 0.0200 Epoch: 11 Global Step: 137390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 13:37:34,758-Speed 3327.78 samples/sec Loss 2.9778 LearningRate 0.0200 Epoch: 11 Global Step: 137400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:37,874-Speed 3287.74 samples/sec Loss 3.0479 LearningRate 0.0200 Epoch: 11 Global Step: 137410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:40,969-Speed 3310.05 samples/sec Loss 2.9792 LearningRate 0.0200 Epoch: 11 Global Step: 137420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:44,051-Speed 3322.94 samples/sec Loss 3.1022 LearningRate 0.0200 Epoch: 11 Global Step: 137430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:47,113-Speed 3345.88 samples/sec Loss 3.0029 LearningRate 0.0200 Epoch: 11 Global Step: 137440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:50,252-Speed 3262.53 samples/sec Loss 3.0026 LearningRate 0.0200 Epoch: 11 Global Step: 137450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:53,410-Speed 3244.11 samples/sec Loss 2.9949 LearningRate 0.0200 Epoch: 11 Global Step: 137460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:56,466-Speed 3352.17 samples/sec Loss 2.9885 LearningRate 0.0199 Epoch: 11 Global Step: 137470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:37:59,581-Speed 3288.44 samples/sec Loss 3.0573 LearningRate 0.0199 Epoch: 11 Global Step: 137480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:38:02,774-Speed 3207.39 samples/sec Loss 3.0157 LearningRate 0.0199 Epoch: 11 Global Step: 137490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:38:05,878-Speed 3300.78 samples/sec Loss 2.9805 LearningRate 0.0199 Epoch: 11 Global Step: 137500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:38:08,931-Speed 3354.88 samples/sec Loss 2.9520 LearningRate 0.0199 Epoch: 11 Global Step: 137510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:38:12,010-Speed 3327.23 samples/sec Loss 2.9963 LearningRate 0.0199 Epoch: 11 Global Step: 137520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:38:15,172-Speed 3239.45 samples/sec Loss 3.0443 LearningRate 0.0199 Epoch: 11 Global Step: 137530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:38:18,267-Speed 3309.22 samples/sec Loss 3.0916 LearningRate 0.0199 Epoch: 11 Global Step: 137540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:38:21,332-Speed 3341.67 samples/sec Loss 3.0639 LearningRate 0.0199 Epoch: 11 Global Step: 137550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:38:24,407-Speed 3331.18 samples/sec Loss 3.0271 LearningRate 0.0199 Epoch: 11 Global Step: 137560 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:38:27,514-Speed 3297.33 samples/sec Loss 3.0324 LearningRate 0.0199 Epoch: 11 Global Step: 137570 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:38:30,605-Speed 3313.73 samples/sec Loss 2.9618 LearningRate 0.0199 Epoch: 11 Global Step: 137580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:38:33,688-Speed 3322.73 samples/sec Loss 3.0481 LearningRate 0.0199 Epoch: 11 Global Step: 137590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:38:36,794-Speed 3297.85 samples/sec Loss 3.0899 LearningRate 0.0199 Epoch: 11 Global Step: 137600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:38:39,865-Speed 3334.63 samples/sec Loss 3.0587 LearningRate 0.0199 Epoch: 11 Global Step: 137610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:38:42,976-Speed 3292.60 samples/sec Loss 3.0024 LearningRate 0.0199 Epoch: 11 Global Step: 137620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:38:46,037-Speed 3347.02 samples/sec Loss 3.0826 LearningRate 0.0199 Epoch: 11 Global Step: 137630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:38:49,156-Speed 3284.14 samples/sec Loss 3.0410 LearningRate 0.0199 Epoch: 11 Global Step: 137640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:38:52,350-Speed 3206.68 samples/sec Loss 3.0573 LearningRate 0.0199 Epoch: 11 Global Step: 137650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:38:55,457-Speed 3297.05 samples/sec Loss 3.0693 LearningRate 0.0199 Epoch: 11 Global Step: 137660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:38:58,497-Speed 3369.58 samples/sec Loss 3.0268 LearningRate 0.0199 Epoch: 11 Global Step: 137670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:01,602-Speed 3299.23 samples/sec Loss 3.0078 LearningRate 0.0199 Epoch: 11 Global Step: 137680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:04,705-Speed 3301.35 samples/sec Loss 2.9507 LearningRate 0.0199 Epoch: 11 Global Step: 137690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:07,811-Speed 3297.61 samples/sec Loss 3.0718 LearningRate 0.0199 Epoch: 11 Global Step: 137700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:10,913-Speed 3301.97 samples/sec Loss 3.1006 LearningRate 0.0199 Epoch: 11 Global Step: 137710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:14,088-Speed 3226.13 samples/sec Loss 3.0190 LearningRate 0.0199 Epoch: 11 Global Step: 137720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:17,151-Speed 3344.57 samples/sec Loss 3.0691 LearningRate 0.0199 Epoch: 11 Global Step: 137730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:20,213-Speed 3344.37 samples/sec Loss 3.0226 LearningRate 0.0199 Epoch: 11 Global Step: 137740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:23,345-Speed 3270.91 samples/sec Loss 3.0287 LearningRate 0.0198 Epoch: 11 Global Step: 137750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:26,499-Speed 3247.76 samples/sec Loss 3.0392 LearningRate 0.0198 Epoch: 11 Global Step: 137760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:29,617-Speed 3285.18 samples/sec Loss 3.1354 LearningRate 0.0198 Epoch: 11 Global Step: 137770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:39:32,781-Speed 3237.96 samples/sec Loss 3.0952 LearningRate 0.0198 Epoch: 11 Global Step: 137780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:39:36,564-Speed 2707.40 samples/sec Loss 3.1069 LearningRate 0.0198 Epoch: 11 Global Step: 137790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:39:39,626-Speed 3345.69 samples/sec Loss 3.0278 LearningRate 0.0198 Epoch: 11 Global Step: 137800 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:42,736-Speed 3293.39 samples/sec Loss 3.0545 LearningRate 0.0198 Epoch: 11 Global Step: 137810 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:45,788-Speed 3355.64 samples/sec Loss 3.1130 LearningRate 0.0198 Epoch: 11 Global Step: 137820 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:48,871-Speed 3322.78 samples/sec Loss 3.0758 LearningRate 0.0198 Epoch: 11 Global Step: 137830 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:51,954-Speed 3323.28 samples/sec Loss 3.1478 LearningRate 0.0198 Epoch: 11 Global Step: 137840 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:55,079-Speed 3277.46 samples/sec Loss 3.1008 LearningRate 0.0198 Epoch: 11 Global Step: 137850 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:39:58,151-Speed 3333.97 samples/sec Loss 3.0776 LearningRate 0.0198 Epoch: 11 Global Step: 137860 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:01,343-Speed 3209.44 samples/sec Loss 3.1136 LearningRate 0.0198 Epoch: 11 Global Step: 137870 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:04,530-Speed 3214.11 samples/sec Loss 3.0440 LearningRate 0.0198 Epoch: 11 Global Step: 137880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:07,682-Speed 3249.20 samples/sec Loss 3.0770 LearningRate 0.0198 Epoch: 11 Global Step: 137890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:10,793-Speed 3292.42 samples/sec Loss 2.9812 LearningRate 0.0198 Epoch: 11 Global Step: 137900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:40:13,903-Speed 3294.14 samples/sec Loss 3.0852 LearningRate 0.0198 Epoch: 11 Global Step: 137910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:40:17,086-Speed 3218.13 samples/sec Loss 3.0874 LearningRate 0.0198 Epoch: 11 Global Step: 137920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:40:20,246-Speed 3241.62 samples/sec Loss 3.0427 LearningRate 0.0198 Epoch: 11 Global Step: 137930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:40:23,359-Speed 3290.00 samples/sec Loss 3.0831 LearningRate 0.0198 Epoch: 11 Global Step: 137940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:40:26,460-Speed 3303.32 samples/sec Loss 3.0613 LearningRate 0.0198 Epoch: 11 Global Step: 137950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:40:29,607-Speed 3254.81 samples/sec Loss 3.0311 LearningRate 0.0198 Epoch: 11 Global Step: 137960 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:32,743-Speed 3265.90 samples/sec Loss 3.1829 LearningRate 0.0198 Epoch: 11 Global Step: 137970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:35,890-Speed 3255.53 samples/sec Loss 3.0939 LearningRate 0.0198 Epoch: 11 Global Step: 137980 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:39,006-Speed 3287.24 samples/sec Loss 3.1438 LearningRate 0.0198 Epoch: 11 Global Step: 137990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:42,148-Speed 3259.88 samples/sec Loss 3.1318 LearningRate 0.0198 Epoch: 11 Global Step: 138000 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:45,252-Speed 3299.70 samples/sec Loss 3.0678 LearningRate 0.0198 Epoch: 11 Global Step: 138010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:48,394-Speed 3260.43 samples/sec Loss 3.0242 LearningRate 0.0197 Epoch: 11 Global Step: 138020 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:51,476-Speed 3323.83 samples/sec Loss 3.0944 LearningRate 0.0197 Epoch: 11 Global Step: 138030 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:54,596-Speed 3282.74 samples/sec Loss 3.0501 LearningRate 0.0197 Epoch: 11 Global Step: 138040 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:40:57,696-Speed 3304.19 samples/sec Loss 3.1330 LearningRate 0.0197 Epoch: 11 Global Step: 138050 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:41:00,917-Speed 3180.43 samples/sec Loss 3.1153 LearningRate 0.0197 Epoch: 11 Global Step: 138060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:41:04,047-Speed 3272.83 samples/sec Loss 3.1172 LearningRate 0.0197 Epoch: 11 Global Step: 138070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:41:07,792-Speed 2734.91 samples/sec Loss 3.0496 LearningRate 0.0197 Epoch: 11 Global Step: 138080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:41:10,909-Speed 3285.97 samples/sec Loss 3.0773 LearningRate 0.0197 Epoch: 11 Global Step: 138090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:41:15,334-Speed 2314.56 samples/sec Loss 3.0966 LearningRate 0.0197 Epoch: 11 Global Step: 138100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:41:19,825-Speed 2280.89 samples/sec Loss 3.0684 LearningRate 0.0197 Epoch: 11 Global Step: 138110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:41:22,906-Speed 3324.83 samples/sec Loss 3.1233 LearningRate 0.0197 Epoch: 11 Global Step: 138120 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:41:26,009-Speed 3300.95 samples/sec Loss 3.1196 LearningRate 0.0197 Epoch: 11 Global Step: 138130 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:41:29,154-Speed 3256.85 samples/sec Loss 3.0760 LearningRate 0.0197 Epoch: 11 Global Step: 138140 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:41:32,311-Speed 3244.78 samples/sec Loss 3.1260 LearningRate 0.0197 Epoch: 11 Global Step: 138150 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:41:35,497-Speed 3215.55 samples/sec Loss 3.0963 LearningRate 0.0197 Epoch: 11 Global Step: 138160 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:41:38,668-Speed 3229.86 samples/sec Loss 3.1184 LearningRate 0.0197 Epoch: 11 Global Step: 138170 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:41:41,766-Speed 3306.10 samples/sec Loss 3.1344 LearningRate 0.0197 Epoch: 11 Global Step: 138180 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:41:44,849-Speed 3323.03 samples/sec Loss 3.1668 LearningRate 0.0197 Epoch: 11 Global Step: 138190 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:41:48,050-Speed 3199.01 samples/sec Loss 3.1163 LearningRate 0.0197 Epoch: 11 Global Step: 138200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:41:51,140-Speed 3315.30 samples/sec Loss 3.0902 LearningRate 0.0197 Epoch: 11 Global Step: 138210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 13:41:54,297-Speed 3244.06 samples/sec Loss 3.1951 LearningRate 0.0197 Epoch: 11 Global Step: 138220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 13:41:57,373-Speed 3331.38 samples/sec Loss 3.1892 LearningRate 0.0197 Epoch: 11 Global Step: 138230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:42:00,470-Speed 3306.52 samples/sec Loss 3.1045 LearningRate 0.0197 Epoch: 11 Global Step: 138240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:42:03,591-Speed 3282.79 samples/sec Loss 3.1161 LearningRate 0.0197 Epoch: 11 Global Step: 138250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:42:06,702-Speed 3292.46 samples/sec Loss 3.0605 LearningRate 0.0197 Epoch: 11 Global Step: 138260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:42:09,793-Speed 3314.09 samples/sec Loss 3.0878 LearningRate 0.0197 Epoch: 11 Global Step: 138270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:42:12,920-Speed 3276.26 samples/sec Loss 3.1256 LearningRate 0.0197 Epoch: 11 Global Step: 138280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:42:16,063-Speed 3259.13 samples/sec Loss 3.0920 LearningRate 0.0197 Epoch: 11 Global Step: 138290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:42:19,197-Speed 3267.95 samples/sec Loss 3.1458 LearningRate 0.0196 Epoch: 11 Global Step: 138300 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:42:22,334-Speed 3264.63 samples/sec Loss 3.1472 LearningRate 0.0196 Epoch: 11 Global Step: 138310 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:42:25,490-Speed 3246.16 samples/sec Loss 3.0793 LearningRate 0.0196 Epoch: 11 Global Step: 138320 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:42:28,566-Speed 3329.77 samples/sec Loss 3.1540 LearningRate 0.0196 Epoch: 11 Global Step: 138330 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:42:31,692-Speed 3276.90 samples/sec Loss 3.1125 LearningRate 0.0196 Epoch: 11 Global Step: 138340 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:42:34,762-Speed 3336.78 samples/sec Loss 3.1431 LearningRate 0.0196 Epoch: 11 Global Step: 138350 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:42:37,865-Speed 3300.89 samples/sec Loss 3.0955 LearningRate 0.0196 Epoch: 11 Global Step: 138360 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:42:40,965-Speed 3304.53 samples/sec Loss 3.1561 LearningRate 0.0196 Epoch: 11 Global Step: 138370 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:42:44,101-Speed 3266.52 samples/sec Loss 3.1677 LearningRate 0.0196 Epoch: 11 Global Step: 138380 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:42:47,243-Speed 3260.10 samples/sec Loss 3.1719 LearningRate 0.0196 Epoch: 11 Global Step: 138390 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:42:50,439-Speed 3205.16 samples/sec Loss 3.1821 LearningRate 0.0196 Epoch: 11 Global Step: 138400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:42:53,637-Speed 3203.08 samples/sec Loss 3.1284 LearningRate 0.0196 Epoch: 11 Global Step: 138410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:42:56,729-Speed 3312.37 samples/sec Loss 3.1359 LearningRate 0.0196 Epoch: 11 Global Step: 138420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:42:59,895-Speed 3235.29 samples/sec Loss 3.1722 LearningRate 0.0196 Epoch: 11 Global Step: 138430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:43:03,105-Speed 3191.16 samples/sec Loss 3.1440 LearningRate 0.0196 Epoch: 11 Global Step: 138440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:43:06,186-Speed 3324.89 samples/sec Loss 3.1751 LearningRate 0.0196 Epoch: 11 Global Step: 138450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:43:09,289-Speed 3301.36 samples/sec Loss 3.1460 LearningRate 0.0196 Epoch: 11 Global Step: 138460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:43:12,363-Speed 3331.79 samples/sec Loss 3.1939 LearningRate 0.0196 Epoch: 11 Global Step: 138470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:43:15,521-Speed 3244.64 samples/sec Loss 3.1063 LearningRate 0.0196 Epoch: 11 Global Step: 138480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:43:18,620-Speed 3305.26 samples/sec Loss 3.1581 LearningRate 0.0196 Epoch: 11 Global Step: 138490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:43:21,701-Speed 3324.25 samples/sec Loss 3.1050 LearningRate 0.0196 Epoch: 11 Global Step: 138500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:43:24,809-Speed 3296.44 samples/sec Loss 3.2375 LearningRate 0.0196 Epoch: 11 Global Step: 138510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:43:28,002-Speed 3207.42 samples/sec Loss 3.1288 LearningRate 0.0196 Epoch: 11 Global Step: 138520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:43:31,121-Speed 3284.84 samples/sec Loss 3.2221 LearningRate 0.0196 Epoch: 11 Global Step: 138530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:43:34,213-Speed 3313.13 samples/sec Loss 3.1042 LearningRate 0.0196 Epoch: 11 Global Step: 138540 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:43:37,384-Speed 3229.94 samples/sec Loss 3.1283 LearningRate 0.0196 Epoch: 11 Global Step: 138550 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:43:40,440-Speed 3351.80 samples/sec Loss 3.1360 LearningRate 0.0196 Epoch: 11 Global Step: 138560 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:43:43,531-Speed 3313.37 samples/sec Loss 3.1580 LearningRate 0.0196 Epoch: 11 Global Step: 138570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:43:46,614-Speed 3323.27 samples/sec Loss 3.1217 LearningRate 0.0196 Epoch: 11 Global Step: 138580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:43:49,745-Speed 3270.91 samples/sec Loss 3.2129 LearningRate 0.0195 Epoch: 11 Global Step: 138590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:43:52,894-Speed 3252.90 samples/sec Loss 3.1876 LearningRate 0.0195 Epoch: 11 Global Step: 138600 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:43:55,982-Speed 3317.48 samples/sec Loss 3.1486 LearningRate 0.0195 Epoch: 11 Global Step: 138610 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:43:59,071-Speed 3315.90 samples/sec Loss 3.2081 LearningRate 0.0195 Epoch: 11 Global Step: 138620 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:44:02,177-Speed 3297.32 samples/sec Loss 3.2059 LearningRate 0.0195 Epoch: 11 Global Step: 138630 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:44:05,257-Speed 3326.44 samples/sec Loss 3.2874 LearningRate 0.0195 Epoch: 11 Global Step: 138640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:44:08,317-Speed 3347.19 samples/sec Loss 3.1658 LearningRate 0.0195 Epoch: 11 Global Step: 138650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:44:11,443-Speed 3276.52 samples/sec Loss 3.1957 LearningRate 0.0195 Epoch: 11 Global Step: 138660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:44:14,559-Speed 3287.62 samples/sec Loss 3.2645 LearningRate 0.0195 Epoch: 11 Global Step: 138670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:44:17,732-Speed 3227.99 samples/sec Loss 3.1410 LearningRate 0.0195 Epoch: 11 Global Step: 138680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:44:20,802-Speed 3336.59 samples/sec Loss 3.0828 LearningRate 0.0195 Epoch: 11 Global Step: 138690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:44:23,894-Speed 3312.94 samples/sec Loss 3.1971 LearningRate 0.0195 Epoch: 11 Global Step: 138700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:44:26,989-Speed 3309.55 samples/sec Loss 3.1984 LearningRate 0.0195 Epoch: 11 Global Step: 138710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:44:30,155-Speed 3234.56 samples/sec Loss 3.1576 LearningRate 0.0195 Epoch: 11 Global Step: 138720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:44:33,252-Speed 3307.83 samples/sec Loss 3.1927 LearningRate 0.0195 Epoch: 11 Global Step: 138730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:44:36,367-Speed 3288.81 samples/sec Loss 3.1936 LearningRate 0.0195 Epoch: 11 Global Step: 138740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:44:39,537-Speed 3231.26 samples/sec Loss 3.1412 LearningRate 0.0195 Epoch: 11 Global Step: 138750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:44:42,729-Speed 3208.49 samples/sec Loss 3.2228 LearningRate 0.0195 Epoch: 11 Global Step: 138760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:44:45,806-Speed 3329.20 samples/sec Loss 3.1431 LearningRate 0.0195 Epoch: 11 Global Step: 138770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:44:48,928-Speed 3281.40 samples/sec Loss 3.2460 LearningRate 0.0195 Epoch: 11 Global Step: 138780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:44:52,066-Speed 3263.72 samples/sec Loss 3.1920 LearningRate 0.0195 Epoch: 11 Global Step: 138790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:44:55,283-Speed 3184.40 samples/sec Loss 3.1375 LearningRate 0.0195 Epoch: 11 Global Step: 138800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:44:58,401-Speed 3285.94 samples/sec Loss 3.1025 LearningRate 0.0195 Epoch: 11 Global Step: 138810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:01,656-Speed 3146.67 samples/sec Loss 3.1406 LearningRate 0.0195 Epoch: 11 Global Step: 138820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:04,887-Speed 3170.36 samples/sec Loss 3.1999 LearningRate 0.0195 Epoch: 11 Global Step: 138830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:07,981-Speed 3310.31 samples/sec Loss 3.1723 LearningRate 0.0195 Epoch: 11 Global Step: 138840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 13:45:11,037-Speed 3351.88 samples/sec Loss 3.2001 LearningRate 0.0195 Epoch: 11 Global Step: 138850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:14,286-Speed 3152.49 samples/sec Loss 3.1317 LearningRate 0.0195 Epoch: 11 Global Step: 138860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:17,378-Speed 3313.53 samples/sec Loss 3.1704 LearningRate 0.0194 Epoch: 11 Global Step: 138870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:20,506-Speed 3274.67 samples/sec Loss 3.1970 LearningRate 0.0194 Epoch: 11 Global Step: 138880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:23,629-Speed 3279.73 samples/sec Loss 3.1689 LearningRate 0.0194 Epoch: 11 Global Step: 138890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:26,728-Speed 3305.26 samples/sec Loss 3.2199 LearningRate 0.0194 Epoch: 11 Global Step: 138900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:29,864-Speed 3265.88 samples/sec Loss 3.2482 LearningRate 0.0194 Epoch: 11 Global Step: 138910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:32,958-Speed 3311.70 samples/sec Loss 3.2064 LearningRate 0.0194 Epoch: 11 Global Step: 138920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:36,161-Speed 3197.93 samples/sec Loss 3.1433 LearningRate 0.0194 Epoch: 11 Global Step: 138930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:39,332-Speed 3230.18 samples/sec Loss 3.2211 LearningRate 0.0194 Epoch: 11 Global Step: 138940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:42,480-Speed 3253.38 samples/sec Loss 3.1174 LearningRate 0.0194 Epoch: 11 Global Step: 138950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:45,592-Speed 3292.64 samples/sec Loss 3.2635 LearningRate 0.0194 Epoch: 11 Global Step: 138960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:48,719-Speed 3275.38 samples/sec Loss 3.1800 LearningRate 0.0194 Epoch: 11 Global Step: 138970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:51,832-Speed 3290.29 samples/sec Loss 3.2012 LearningRate 0.0194 Epoch: 11 Global Step: 138980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:54,927-Speed 3309.64 samples/sec Loss 3.2528 LearningRate 0.0194 Epoch: 11 Global Step: 138990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:45:58,012-Speed 3319.94 samples/sec Loss 3.2638 LearningRate 0.0194 Epoch: 11 Global Step: 139000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:01,224-Speed 3189.20 samples/sec Loss 3.2277 LearningRate 0.0194 Epoch: 11 Global Step: 139010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:04,333-Speed 3295.12 samples/sec Loss 3.2251 LearningRate 0.0194 Epoch: 11 Global Step: 139020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:07,463-Speed 3272.18 samples/sec Loss 3.1265 LearningRate 0.0194 Epoch: 11 Global Step: 139030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:10,585-Speed 3281.38 samples/sec Loss 3.2849 LearningRate 0.0194 Epoch: 11 Global Step: 139040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:13,805-Speed 3181.31 samples/sec Loss 3.1645 LearningRate 0.0194 Epoch: 11 Global Step: 139050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 13:46:16,900-Speed 3309.02 samples/sec Loss 3.2053 LearningRate 0.0194 Epoch: 11 Global Step: 139060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:19,988-Speed 3317.41 samples/sec Loss 3.2056 LearningRate 0.0194 Epoch: 11 Global Step: 139070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:23,081-Speed 3311.52 samples/sec Loss 3.2100 LearningRate 0.0194 Epoch: 11 Global Step: 139080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:26,177-Speed 3308.94 samples/sec Loss 3.2263 LearningRate 0.0194 Epoch: 11 Global Step: 139090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:29,311-Speed 3267.71 samples/sec Loss 3.2677 LearningRate 0.0194 Epoch: 11 Global Step: 139100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:32,390-Speed 3327.47 samples/sec Loss 3.1792 LearningRate 0.0194 Epoch: 11 Global Step: 139110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:35,492-Speed 3301.83 samples/sec Loss 3.2040 LearningRate 0.0194 Epoch: 11 Global Step: 139120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:38,598-Speed 3298.79 samples/sec Loss 3.2583 LearningRate 0.0194 Epoch: 11 Global Step: 139130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:41,736-Speed 3263.42 samples/sec Loss 3.1888 LearningRate 0.0194 Epoch: 11 Global Step: 139140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:46:44,808-Speed 3334.45 samples/sec Loss 3.1994 LearningRate 0.0193 Epoch: 11 Global Step: 139150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:46:47,880-Speed 3334.48 samples/sec Loss 3.1752 LearningRate 0.0193 Epoch: 11 Global Step: 139160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:46:50,985-Speed 3298.86 samples/sec Loss 3.2656 LearningRate 0.0193 Epoch: 11 Global Step: 139170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:46:54,052-Speed 3339.76 samples/sec Loss 3.2296 LearningRate 0.0193 Epoch: 11 Global Step: 139180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:46:57,104-Speed 3357.16 samples/sec Loss 3.2117 LearningRate 0.0193 Epoch: 11 Global Step: 139190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:00,210-Speed 3297.24 samples/sec Loss 3.2530 LearningRate 0.0193 Epoch: 11 Global Step: 139200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:03,325-Speed 3289.26 samples/sec Loss 3.2725 LearningRate 0.0193 Epoch: 11 Global Step: 139210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:06,451-Speed 3276.54 samples/sec Loss 3.1985 LearningRate 0.0193 Epoch: 11 Global Step: 139220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:09,544-Speed 3311.55 samples/sec Loss 3.1968 LearningRate 0.0193 Epoch: 11 Global Step: 139230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:12,633-Speed 3316.47 samples/sec Loss 3.2492 LearningRate 0.0193 Epoch: 11 Global Step: 139240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:15,724-Speed 3313.18 samples/sec Loss 3.2469 LearningRate 0.0193 Epoch: 11 Global Step: 139250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:47:18,817-Speed 3312.58 samples/sec Loss 3.2386 LearningRate 0.0193 Epoch: 11 Global Step: 139260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:47:21,875-Speed 3349.87 samples/sec Loss 3.1837 LearningRate 0.0193 Epoch: 11 Global Step: 139270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:47:25,041-Speed 3234.44 samples/sec Loss 3.2495 LearningRate 0.0193 Epoch: 11 Global Step: 139280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:47:28,186-Speed 3257.23 samples/sec Loss 3.2239 LearningRate 0.0193 Epoch: 11 Global Step: 139290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:47:31,368-Speed 3219.72 samples/sec Loss 3.2116 LearningRate 0.0193 Epoch: 11 Global Step: 139300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:34,518-Speed 3252.04 samples/sec Loss 3.2902 LearningRate 0.0193 Epoch: 11 Global Step: 139310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:37,647-Speed 3273.21 samples/sec Loss 3.2831 LearningRate 0.0193 Epoch: 11 Global Step: 139320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:40,780-Speed 3269.75 samples/sec Loss 3.2031 LearningRate 0.0193 Epoch: 11 Global Step: 139330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:43,914-Speed 3268.46 samples/sec Loss 3.1836 LearningRate 0.0193 Epoch: 11 Global Step: 139340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:47,030-Speed 3286.76 samples/sec Loss 3.3300 LearningRate 0.0193 Epoch: 11 Global Step: 139350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:50,150-Speed 3282.77 samples/sec Loss 3.3222 LearningRate 0.0193 Epoch: 11 Global Step: 139360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:53,218-Speed 3340.07 samples/sec Loss 3.2977 LearningRate 0.0193 Epoch: 11 Global Step: 139370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:56,275-Speed 3350.86 samples/sec Loss 3.2693 LearningRate 0.0193 Epoch: 11 Global Step: 139380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:47:59,347-Speed 3333.80 samples/sec Loss 3.2554 LearningRate 0.0193 Epoch: 11 Global Step: 139390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:48:02,483-Speed 3265.94 samples/sec Loss 3.2621 LearningRate 0.0193 Epoch: 11 Global Step: 139400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:48:05,587-Speed 3301.14 samples/sec Loss 3.2200 LearningRate 0.0193 Epoch: 11 Global Step: 139410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:48:08,644-Speed 3349.80 samples/sec Loss 3.2204 LearningRate 0.0193 Epoch: 11 Global Step: 139420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:48:11,760-Speed 3287.24 samples/sec Loss 3.2536 LearningRate 0.0192 Epoch: 11 Global Step: 139430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:48:14,840-Speed 3325.92 samples/sec Loss 3.2710 LearningRate 0.0192 Epoch: 11 Global Step: 139440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:48:18,005-Speed 3236.76 samples/sec Loss 3.1879 LearningRate 0.0192 Epoch: 11 Global Step: 139450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:48:21,117-Speed 3291.51 samples/sec Loss 3.3015 LearningRate 0.0192 Epoch: 11 Global Step: 139460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:48:24,249-Speed 3270.40 samples/sec Loss 3.2510 LearningRate 0.0192 Epoch: 11 Global Step: 139470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:48:27,466-Speed 3183.82 samples/sec Loss 3.3434 LearningRate 0.0192 Epoch: 11 Global Step: 139480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:48:30,692-Speed 3175.80 samples/sec Loss 3.2592 LearningRate 0.0192 Epoch: 11 Global Step: 139490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:48:33,764-Speed 3334.20 samples/sec Loss 3.2478 LearningRate 0.0192 Epoch: 11 Global Step: 139500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:48:36,989-Speed 3176.16 samples/sec Loss 3.2518 LearningRate 0.0192 Epoch: 11 Global Step: 139510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:48:40,109-Speed 3282.68 samples/sec Loss 3.2531 LearningRate 0.0192 Epoch: 11 Global Step: 139520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:48:43,202-Speed 3312.51 samples/sec Loss 3.3528 LearningRate 0.0192 Epoch: 11 Global Step: 139530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:48:46,308-Speed 3297.82 samples/sec Loss 3.2559 LearningRate 0.0192 Epoch: 11 Global Step: 139540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:48:49,435-Speed 3275.21 samples/sec Loss 3.2653 LearningRate 0.0192 Epoch: 11 Global Step: 139550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:48:52,554-Speed 3284.20 samples/sec Loss 3.2914 LearningRate 0.0192 Epoch: 11 Global Step: 139560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:48:55,647-Speed 3311.68 samples/sec Loss 3.1999 LearningRate 0.0192 Epoch: 11 Global Step: 139570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:48:58,779-Speed 3271.03 samples/sec Loss 3.2495 LearningRate 0.0192 Epoch: 11 Global Step: 139580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:49:01,883-Speed 3299.81 samples/sec Loss 3.3061 LearningRate 0.0192 Epoch: 11 Global Step: 139590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:49:04,960-Speed 3329.16 samples/sec Loss 3.2620 LearningRate 0.0192 Epoch: 11 Global Step: 139600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:08,016-Speed 3351.49 samples/sec Loss 3.2473 LearningRate 0.0192 Epoch: 11 Global Step: 139610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:11,097-Speed 3324.94 samples/sec Loss 3.4010 LearningRate 0.0192 Epoch: 11 Global Step: 139620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:14,282-Speed 3216.19 samples/sec Loss 3.2803 LearningRate 0.0192 Epoch: 11 Global Step: 139630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:17,376-Speed 3309.97 samples/sec Loss 3.3379 LearningRate 0.0192 Epoch: 11 Global Step: 139640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:20,452-Speed 3330.49 samples/sec Loss 3.3389 LearningRate 0.0192 Epoch: 11 Global Step: 139650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:23,530-Speed 3328.23 samples/sec Loss 3.2073 LearningRate 0.0192 Epoch: 11 Global Step: 139660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:26,623-Speed 3311.74 samples/sec Loss 3.2849 LearningRate 0.0192 Epoch: 11 Global Step: 139670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:29,696-Speed 3332.95 samples/sec Loss 3.3273 LearningRate 0.0192 Epoch: 11 Global Step: 139680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:32,802-Speed 3297.94 samples/sec Loss 3.2448 LearningRate 0.0192 Epoch: 11 Global Step: 139690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:35,942-Speed 3262.32 samples/sec Loss 3.3080 LearningRate 0.0192 Epoch: 11 Global Step: 139700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:49:39,026-Speed 3321.11 samples/sec Loss 3.3257 LearningRate 0.0191 Epoch: 11 Global Step: 139710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:49:42,064-Speed 3372.46 samples/sec Loss 3.2817 LearningRate 0.0191 Epoch: 11 Global Step: 139720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:45,181-Speed 3286.15 samples/sec Loss 3.3291 LearningRate 0.0191 Epoch: 11 Global Step: 139730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:48,336-Speed 3245.61 samples/sec Loss 3.3205 LearningRate 0.0191 Epoch: 11 Global Step: 139740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:51,439-Speed 3301.34 samples/sec Loss 3.3121 LearningRate 0.0191 Epoch: 11 Global Step: 139750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:54,671-Speed 3169.55 samples/sec Loss 3.3329 LearningRate 0.0191 Epoch: 11 Global Step: 139760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:49:57,757-Speed 3318.87 samples/sec Loss 3.3584 LearningRate 0.0191 Epoch: 11 Global Step: 139770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:00,892-Speed 3267.66 samples/sec Loss 3.2591 LearningRate 0.0191 Epoch: 11 Global Step: 139780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:04,058-Speed 3235.74 samples/sec Loss 3.2959 LearningRate 0.0191 Epoch: 11 Global Step: 139790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:07,164-Speed 3297.27 samples/sec Loss 3.3085 LearningRate 0.0191 Epoch: 11 Global Step: 139800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:10,252-Speed 3316.86 samples/sec Loss 3.2903 LearningRate 0.0191 Epoch: 11 Global Step: 139810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:13,333-Speed 3325.19 samples/sec Loss 3.2874 LearningRate 0.0191 Epoch: 11 Global Step: 139820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:50:16,471-Speed 3263.84 samples/sec Loss 3.2664 LearningRate 0.0191 Epoch: 11 Global Step: 139830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:50:19,545-Speed 3332.05 samples/sec Loss 3.2399 LearningRate 0.0191 Epoch: 11 Global Step: 139840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:22,634-Speed 3316.64 samples/sec Loss 3.2864 LearningRate 0.0191 Epoch: 11 Global Step: 139850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:25,728-Speed 3310.28 samples/sec Loss 3.3064 LearningRate 0.0191 Epoch: 11 Global Step: 139860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:28,846-Speed 3285.00 samples/sec Loss 3.2940 LearningRate 0.0191 Epoch: 11 Global Step: 139870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:31,942-Speed 3309.16 samples/sec Loss 3.2878 LearningRate 0.0191 Epoch: 11 Global Step: 139880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:35,065-Speed 3279.66 samples/sec Loss 3.3082 LearningRate 0.0191 Epoch: 11 Global Step: 139890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:38,142-Speed 3328.82 samples/sec Loss 3.2937 LearningRate 0.0191 Epoch: 11 Global Step: 139900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:41,328-Speed 3214.54 samples/sec Loss 3.2184 LearningRate 0.0191 Epoch: 11 Global Step: 139910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:44,472-Speed 3258.50 samples/sec Loss 3.2915 LearningRate 0.0191 Epoch: 11 Global Step: 139920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:47,562-Speed 3314.79 samples/sec Loss 3.3602 LearningRate 0.0191 Epoch: 11 Global Step: 139930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:50:50,689-Speed 3276.04 samples/sec Loss 3.3344 LearningRate 0.0191 Epoch: 11 Global Step: 139940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:50:53,802-Speed 3290.64 samples/sec Loss 3.3327 LearningRate 0.0191 Epoch: 11 Global Step: 139950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:50:56,891-Speed 3315.95 samples/sec Loss 3.2338 LearningRate 0.0191 Epoch: 11 Global Step: 139960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:50:59,964-Speed 3333.07 samples/sec Loss 3.2635 LearningRate 0.0191 Epoch: 11 Global Step: 139970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:51:03,101-Speed 3265.26 samples/sec Loss 3.2097 LearningRate 0.0191 Epoch: 11 Global Step: 139980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:51:06,253-Speed 3249.88 samples/sec Loss 3.3287 LearningRate 0.0191 Epoch: 11 Global Step: 139990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:51:09,366-Speed 3289.95 samples/sec Loss 3.3354 LearningRate 0.0190 Epoch: 11 Global Step: 140000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:51:12,497-Speed 3271.96 samples/sec Loss 3.3043 LearningRate 0.0190 Epoch: 11 Global Step: 140010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:51:15,612-Speed 3288.40 samples/sec Loss 3.3262 LearningRate 0.0190 Epoch: 11 Global Step: 140020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:51:18,726-Speed 3288.58 samples/sec Loss 3.3160 LearningRate 0.0190 Epoch: 11 Global Step: 140030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:51:21,833-Speed 3297.05 samples/sec Loss 3.2808 LearningRate 0.0190 Epoch: 11 Global Step: 140040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:51:24,992-Speed 3243.03 samples/sec Loss 3.3146 LearningRate 0.0190 Epoch: 11 Global Step: 140050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:51:28,118-Speed 3277.19 samples/sec Loss 3.3378 LearningRate 0.0190 Epoch: 11 Global Step: 140060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:51:31,225-Speed 3296.55 samples/sec Loss 3.3151 LearningRate 0.0190 Epoch: 11 Global Step: 140070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:51:34,304-Speed 3326.85 samples/sec Loss 3.3134 LearningRate 0.0190 Epoch: 11 Global Step: 140080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:51:37,416-Speed 3291.81 samples/sec Loss 3.3041 LearningRate 0.0190 Epoch: 11 Global Step: 140090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:51:40,504-Speed 3316.83 samples/sec Loss 3.3347 LearningRate 0.0190 Epoch: 11 Global Step: 140100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:51:43,619-Speed 3287.76 samples/sec Loss 3.3140 LearningRate 0.0190 Epoch: 11 Global Step: 140110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:51:46,699-Speed 3326.35 samples/sec Loss 3.3141 LearningRate 0.0190 Epoch: 11 Global Step: 140120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:51:49,825-Speed 3276.36 samples/sec Loss 3.3322 LearningRate 0.0190 Epoch: 11 Global Step: 140130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:51:52,905-Speed 3326.32 samples/sec Loss 3.3012 LearningRate 0.0190 Epoch: 11 Global Step: 140140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:51:56,022-Speed 3286.27 samples/sec Loss 3.2751 LearningRate 0.0190 Epoch: 11 Global Step: 140150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:51:59,125-Speed 3301.57 samples/sec Loss 3.3996 LearningRate 0.0190 Epoch: 11 Global Step: 140160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:52:02,193-Speed 3337.94 samples/sec Loss 3.3846 LearningRate 0.0190 Epoch: 11 Global Step: 140170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:05,349-Speed 3246.35 samples/sec Loss 3.2754 LearningRate 0.0190 Epoch: 11 Global Step: 140180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:08,425-Speed 3330.02 samples/sec Loss 3.3354 LearningRate 0.0190 Epoch: 11 Global Step: 140190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:11,533-Speed 3295.72 samples/sec Loss 3.2515 LearningRate 0.0190 Epoch: 11 Global Step: 140200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:14,663-Speed 3272.17 samples/sec Loss 3.3058 LearningRate 0.0190 Epoch: 11 Global Step: 140210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:17,836-Speed 3228.92 samples/sec Loss 3.2881 LearningRate 0.0190 Epoch: 11 Global Step: 140220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:20,946-Speed 3293.06 samples/sec Loss 3.3872 LearningRate 0.0190 Epoch: 11 Global Step: 140230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:24,050-Speed 3300.18 samples/sec Loss 3.3729 LearningRate 0.0190 Epoch: 11 Global Step: 140240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:27,179-Speed 3273.98 samples/sec Loss 3.3744 LearningRate 0.0190 Epoch: 11 Global Step: 140250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:30,332-Speed 3248.77 samples/sec Loss 3.2718 LearningRate 0.0190 Epoch: 11 Global Step: 140260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:33,403-Speed 3335.42 samples/sec Loss 3.2373 LearningRate 0.0190 Epoch: 11 Global Step: 140270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:52:36,512-Speed 3294.17 samples/sec Loss 3.2817 LearningRate 0.0189 Epoch: 11 Global Step: 140280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:39,615-Speed 3301.87 samples/sec Loss 3.2898 LearningRate 0.0189 Epoch: 11 Global Step: 140290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:42,722-Speed 3296.23 samples/sec Loss 3.3630 LearningRate 0.0189 Epoch: 11 Global Step: 140300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:45,792-Speed 3336.93 samples/sec Loss 3.3253 LearningRate 0.0189 Epoch: 11 Global Step: 140310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:48,918-Speed 3276.26 samples/sec Loss 3.2806 LearningRate 0.0189 Epoch: 11 Global Step: 140320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:52,044-Speed 3277.09 samples/sec Loss 3.3552 LearningRate 0.0189 Epoch: 11 Global Step: 140330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:55,184-Speed 3262.32 samples/sec Loss 3.2288 LearningRate 0.0189 Epoch: 11 Global Step: 140340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:52:58,273-Speed 3315.83 samples/sec Loss 3.2734 LearningRate 0.0189 Epoch: 11 Global Step: 140350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:01,387-Speed 3289.18 samples/sec Loss 3.3574 LearningRate 0.0189 Epoch: 11 Global Step: 140360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:04,490-Speed 3301.57 samples/sec Loss 3.3078 LearningRate 0.0189 Epoch: 11 Global Step: 140370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:07,606-Speed 3286.65 samples/sec Loss 3.3512 LearningRate 0.0189 Epoch: 11 Global Step: 140380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:53:10,724-Speed 3285.49 samples/sec Loss 3.3222 LearningRate 0.0189 Epoch: 11 Global Step: 140390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:53:13,842-Speed 3284.80 samples/sec Loss 3.3509 LearningRate 0.0189 Epoch: 11 Global Step: 140400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:53:16,972-Speed 3273.19 samples/sec Loss 3.3128 LearningRate 0.0189 Epoch: 11 Global Step: 140410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:53:20,053-Speed 3324.69 samples/sec Loss 3.3279 LearningRate 0.0189 Epoch: 11 Global Step: 140420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:53:23,130-Speed 3328.59 samples/sec Loss 3.2204 LearningRate 0.0189 Epoch: 11 Global Step: 140430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:26,286-Speed 3245.88 samples/sec Loss 3.4259 LearningRate 0.0189 Epoch: 11 Global Step: 140440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:29,415-Speed 3273.85 samples/sec Loss 3.3583 LearningRate 0.0189 Epoch: 11 Global Step: 140450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:32,545-Speed 3272.05 samples/sec Loss 3.3644 LearningRate 0.0189 Epoch: 11 Global Step: 140460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:35,678-Speed 3270.05 samples/sec Loss 3.2926 LearningRate 0.0189 Epoch: 11 Global Step: 140470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:38,832-Speed 3246.98 samples/sec Loss 3.4356 LearningRate 0.0189 Epoch: 11 Global Step: 140480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:41,975-Speed 3259.61 samples/sec Loss 3.2676 LearningRate 0.0189 Epoch: 11 Global Step: 140490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:45,054-Speed 3326.64 samples/sec Loss 3.3292 LearningRate 0.0189 Epoch: 11 Global Step: 140500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:48,132-Speed 3328.25 samples/sec Loss 3.3331 LearningRate 0.0189 Epoch: 11 Global Step: 140510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:51,246-Speed 3288.73 samples/sec Loss 3.3247 LearningRate 0.0189 Epoch: 11 Global Step: 140520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:53:54,370-Speed 3278.80 samples/sec Loss 3.2245 LearningRate 0.0189 Epoch: 11 Global Step: 140530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:53:57,453-Speed 3322.60 samples/sec Loss 3.3307 LearningRate 0.0189 Epoch: 11 Global Step: 140540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:54:00,546-Speed 3312.18 samples/sec Loss 3.2594 LearningRate 0.0189 Epoch: 11 Global Step: 140550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:54:03,700-Speed 3248.01 samples/sec Loss 3.3387 LearningRate 0.0189 Epoch: 11 Global Step: 140560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:54:06,863-Speed 3238.45 samples/sec Loss 3.3514 LearningRate 0.0188 Epoch: 11 Global Step: 140570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:54:09,937-Speed 3331.75 samples/sec Loss 3.3935 LearningRate 0.0188 Epoch: 11 Global Step: 140580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:54:13,090-Speed 3248.78 samples/sec Loss 3.3691 LearningRate 0.0188 Epoch: 11 Global Step: 140590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:54:16,213-Speed 3280.18 samples/sec Loss 3.3079 LearningRate 0.0188 Epoch: 11 Global Step: 140600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:54:19,394-Speed 3219.91 samples/sec Loss 3.3778 LearningRate 0.0188 Epoch: 11 Global Step: 140610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:54:22,486-Speed 3313.69 samples/sec Loss 3.3103 LearningRate 0.0188 Epoch: 11 Global Step: 140620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:54:25,600-Speed 3288.87 samples/sec Loss 3.3289 LearningRate 0.0188 Epoch: 11 Global Step: 140630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:54:28,730-Speed 3272.14 samples/sec Loss 3.3453 LearningRate 0.0188 Epoch: 11 Global Step: 140640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:54:31,904-Speed 3227.82 samples/sec Loss 3.3634 LearningRate 0.0188 Epoch: 11 Global Step: 140650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:54:35,073-Speed 3231.76 samples/sec Loss 3.3660 LearningRate 0.0188 Epoch: 11 Global Step: 140660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:54:38,219-Speed 3256.70 samples/sec Loss 3.3232 LearningRate 0.0188 Epoch: 11 Global Step: 140670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:54:41,349-Speed 3272.54 samples/sec Loss 3.4350 LearningRate 0.0188 Epoch: 11 Global Step: 140680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:54:44,529-Speed 3221.11 samples/sec Loss 3.3090 LearningRate 0.0188 Epoch: 11 Global Step: 140690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:54:47,683-Speed 3248.40 samples/sec Loss 3.3347 LearningRate 0.0188 Epoch: 11 Global Step: 140700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:54:50,789-Speed 3297.41 samples/sec Loss 3.4283 LearningRate 0.0188 Epoch: 11 Global Step: 140710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:54:53,891-Speed 3301.62 samples/sec Loss 3.4109 LearningRate 0.0188 Epoch: 11 Global Step: 140720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:54:56,983-Speed 3313.32 samples/sec Loss 3.4534 LearningRate 0.0188 Epoch: 11 Global Step: 140730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:55:00,099-Speed 3287.69 samples/sec Loss 3.3945 LearningRate 0.0188 Epoch: 11 Global Step: 140740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:55:03,221-Speed 3280.36 samples/sec Loss 3.3059 LearningRate 0.0188 Epoch: 11 Global Step: 140750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:55:06,373-Speed 3249.51 samples/sec Loss 3.3833 LearningRate 0.0188 Epoch: 11 Global Step: 140760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:55:09,472-Speed 3306.07 samples/sec Loss 3.3397 LearningRate 0.0188 Epoch: 11 Global Step: 140770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:55:12,621-Speed 3252.11 samples/sec Loss 3.3200 LearningRate 0.0188 Epoch: 11 Global Step: 140780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:55:15,757-Speed 3266.45 samples/sec Loss 3.3351 LearningRate 0.0188 Epoch: 11 Global Step: 140790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:55:18,827-Speed 3336.28 samples/sec Loss 3.4107 LearningRate 0.0188 Epoch: 11 Global Step: 140800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:55:21,907-Speed 3325.76 samples/sec Loss 3.3494 LearningRate 0.0188 Epoch: 11 Global Step: 140810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:55:24,995-Speed 3317.59 samples/sec Loss 3.3483 LearningRate 0.0188 Epoch: 11 Global Step: 140820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:55:28,118-Speed 3280.17 samples/sec Loss 3.3453 LearningRate 0.0188 Epoch: 11 Global Step: 140830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:55:31,240-Speed 3280.02 samples/sec Loss 3.3244 LearningRate 0.0188 Epoch: 11 Global Step: 140840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:55:34,387-Speed 3254.95 samples/sec Loss 3.3906 LearningRate 0.0188 Epoch: 11 Global Step: 140850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:55:37,531-Speed 3258.22 samples/sec Loss 3.3771 LearningRate 0.0187 Epoch: 11 Global Step: 140860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:55:40,674-Speed 3259.26 samples/sec Loss 3.3885 LearningRate 0.0187 Epoch: 11 Global Step: 140870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:55:43,744-Speed 3336.26 samples/sec Loss 3.3664 LearningRate 0.0187 Epoch: 11 Global Step: 140880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:55:46,829-Speed 3320.37 samples/sec Loss 3.4062 LearningRate 0.0187 Epoch: 11 Global Step: 140890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:55:49,919-Speed 3314.87 samples/sec Loss 3.4396 LearningRate 0.0187 Epoch: 11 Global Step: 140900 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:55:53,087-Speed 3233.11 samples/sec Loss 3.4236 LearningRate 0.0187 Epoch: 11 Global Step: 140910 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:55:56,232-Speed 3257.10 samples/sec Loss 3.3612 LearningRate 0.0187 Epoch: 11 Global Step: 140920 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:55:59,335-Speed 3301.76 samples/sec Loss 3.3196 LearningRate 0.0187 Epoch: 11 Global Step: 140930 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:02,411-Speed 3329.25 samples/sec Loss 3.3603 LearningRate 0.0187 Epoch: 11 Global Step: 140940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:05,573-Speed 3239.70 samples/sec Loss 3.3415 LearningRate 0.0187 Epoch: 11 Global Step: 140950 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:08,667-Speed 3311.03 samples/sec Loss 3.2635 LearningRate 0.0187 Epoch: 11 Global Step: 140960 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:11,759-Speed 3312.09 samples/sec Loss 3.3190 LearningRate 0.0187 Epoch: 11 Global Step: 140970 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:14,838-Speed 3327.66 samples/sec Loss 3.2943 LearningRate 0.0187 Epoch: 11 Global Step: 140980 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:17,935-Speed 3306.79 samples/sec Loss 3.4303 LearningRate 0.0187 Epoch: 11 Global Step: 140990 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:21,019-Speed 3321.89 samples/sec Loss 3.3828 LearningRate 0.0187 Epoch: 11 Global Step: 141000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:56:24,134-Speed 3287.72 samples/sec Loss 3.3761 LearningRate 0.0187 Epoch: 11 Global Step: 141010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:56:27,290-Speed 3246.03 samples/sec Loss 3.3834 LearningRate 0.0187 Epoch: 11 Global Step: 141020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:56:30,517-Speed 3173.53 samples/sec Loss 3.4165 LearningRate 0.0187 Epoch: 11 Global Step: 141030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:56:33,586-Speed 3337.73 samples/sec Loss 3.3504 LearningRate 0.0187 Epoch: 11 Global Step: 141040 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:36,745-Speed 3242.88 samples/sec Loss 3.3311 LearningRate 0.0187 Epoch: 11 Global Step: 141050 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:39,819-Speed 3332.28 samples/sec Loss 3.4006 LearningRate 0.0187 Epoch: 11 Global Step: 141060 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:42,932-Speed 3290.09 samples/sec Loss 3.3289 LearningRate 0.0187 Epoch: 11 Global Step: 141070 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:46,023-Speed 3314.54 samples/sec Loss 3.3632 LearningRate 0.0187 Epoch: 11 Global Step: 141080 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:49,093-Speed 3336.87 samples/sec Loss 3.3765 LearningRate 0.0187 Epoch: 11 Global Step: 141090 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:52,197-Speed 3299.51 samples/sec Loss 3.3436 LearningRate 0.0187 Epoch: 11 Global Step: 141100 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:55,408-Speed 3190.22 samples/sec Loss 3.2790 LearningRate 0.0187 Epoch: 11 Global Step: 141110 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:56:58,478-Speed 3336.68 samples/sec Loss 3.3496 LearningRate 0.0187 Epoch: 11 Global Step: 141120 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:57:01,565-Speed 3318.66 samples/sec Loss 3.4417 LearningRate 0.0187 Epoch: 11 Global Step: 141130 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:57:04,704-Speed 3263.02 samples/sec Loss 3.3097 LearningRate 0.0186 Epoch: 11 Global Step: 141140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:07,760-Speed 3352.12 samples/sec Loss 3.3391 LearningRate 0.0186 Epoch: 11 Global Step: 141150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:10,834-Speed 3332.30 samples/sec Loss 3.4423 LearningRate 0.0186 Epoch: 11 Global Step: 141160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:13,984-Speed 3252.16 samples/sec Loss 3.4114 LearningRate 0.0186 Epoch: 11 Global Step: 141170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:17,193-Speed 3192.14 samples/sec Loss 3.3152 LearningRate 0.0186 Epoch: 11 Global Step: 141180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:20,269-Speed 3329.77 samples/sec Loss 3.3376 LearningRate 0.0186 Epoch: 11 Global Step: 141190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:23,525-Speed 3145.79 samples/sec Loss 3.3786 LearningRate 0.0186 Epoch: 11 Global Step: 141200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:26,664-Speed 3263.56 samples/sec Loss 3.3984 LearningRate 0.0186 Epoch: 11 Global Step: 141210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:29,779-Speed 3288.29 samples/sec Loss 3.3971 LearningRate 0.0186 Epoch: 11 Global Step: 141220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:32,881-Speed 3302.61 samples/sec Loss 3.3154 LearningRate 0.0186 Epoch: 11 Global Step: 141230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:36,012-Speed 3270.53 samples/sec Loss 3.4299 LearningRate 0.0186 Epoch: 11 Global Step: 141240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:57:39,124-Speed 3291.72 samples/sec Loss 3.2813 LearningRate 0.0186 Epoch: 11 Global Step: 141250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:57:42,231-Speed 3297.62 samples/sec Loss 3.3475 LearningRate 0.0186 Epoch: 11 Global Step: 141260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:45,286-Speed 3352.97 samples/sec Loss 3.3740 LearningRate 0.0186 Epoch: 11 Global Step: 141270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:48,391-Speed 3298.30 samples/sec Loss 3.3622 LearningRate 0.0186 Epoch: 11 Global Step: 141280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:51,510-Speed 3284.75 samples/sec Loss 3.3288 LearningRate 0.0186 Epoch: 11 Global Step: 141290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:54,653-Speed 3258.86 samples/sec Loss 3.3647 LearningRate 0.0186 Epoch: 11 Global Step: 141300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:57:57,721-Speed 3339.10 samples/sec Loss 3.3862 LearningRate 0.0186 Epoch: 11 Global Step: 141310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:58:00,854-Speed 3268.80 samples/sec Loss 3.3894 LearningRate 0.0186 Epoch: 11 Global Step: 141320 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:58:03,976-Speed 3280.68 samples/sec Loss 3.3030 LearningRate 0.0186 Epoch: 11 Global Step: 141330 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:58:07,108-Speed 3270.51 samples/sec Loss 3.4233 LearningRate 0.0186 Epoch: 11 Global Step: 141340 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:58:10,191-Speed 3323.17 samples/sec Loss 3.3524 LearningRate 0.0186 Epoch: 11 Global Step: 141350 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:58:13,259-Speed 3338.27 samples/sec Loss 3.4972 LearningRate 0.0186 Epoch: 11 Global Step: 141360 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:58:16,322-Speed 3344.52 samples/sec Loss 3.3834 LearningRate 0.0186 Epoch: 11 Global Step: 141370 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:58:19,368-Speed 3362.53 samples/sec Loss 3.4401 LearningRate 0.0186 Epoch: 11 Global Step: 141380 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:58:22,454-Speed 3319.70 samples/sec Loss 3.3647 LearningRate 0.0186 Epoch: 11 Global Step: 141390 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:58:25,527-Speed 3333.61 samples/sec Loss 3.4403 LearningRate 0.0186 Epoch: 11 Global Step: 141400 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:58:28,638-Speed 3292.63 samples/sec Loss 3.3089 LearningRate 0.0186 Epoch: 11 Global Step: 141410 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 13:58:31,724-Speed 3319.02 samples/sec Loss 3.4323 LearningRate 0.0186 Epoch: 11 Global Step: 141420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:58:34,804-Speed 3326.11 samples/sec Loss 3.3652 LearningRate 0.0185 Epoch: 11 Global Step: 141430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:58:37,956-Speed 3249.75 samples/sec Loss 3.3221 LearningRate 0.0185 Epoch: 11 Global Step: 141440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:58:41,007-Speed 3356.75 samples/sec Loss 3.3757 LearningRate 0.0185 Epoch: 11 Global Step: 141450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:58:44,071-Speed 3342.89 samples/sec Loss 3.4484 LearningRate 0.0185 Epoch: 11 Global Step: 141460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:58:47,177-Speed 3298.67 samples/sec Loss 3.4092 LearningRate 0.0185 Epoch: 11 Global Step: 141470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:58:50,309-Speed 3270.47 samples/sec Loss 3.4598 LearningRate 0.0185 Epoch: 11 Global Step: 141480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:58:53,409-Speed 3304.57 samples/sec Loss 3.4014 LearningRate 0.0185 Epoch: 11 Global Step: 141490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:58:56,464-Speed 3352.50 samples/sec Loss 3.4219 LearningRate 0.0185 Epoch: 11 Global Step: 141500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:58:59,541-Speed 3328.57 samples/sec Loss 3.3819 LearningRate 0.0185 Epoch: 11 Global Step: 141510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:02,600-Speed 3348.67 samples/sec Loss 3.3797 LearningRate 0.0185 Epoch: 11 Global Step: 141520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:59:05,673-Speed 3333.59 samples/sec Loss 3.3513 LearningRate 0.0185 Epoch: 11 Global Step: 141530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:08,751-Speed 3328.29 samples/sec Loss 3.3794 LearningRate 0.0185 Epoch: 11 Global Step: 141540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:11,871-Speed 3282.47 samples/sec Loss 3.4620 LearningRate 0.0185 Epoch: 11 Global Step: 141550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:15,068-Speed 3204.11 samples/sec Loss 3.4186 LearningRate 0.0185 Epoch: 11 Global Step: 141560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:18,212-Speed 3258.76 samples/sec Loss 3.3303 LearningRate 0.0185 Epoch: 11 Global Step: 141570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:21,293-Speed 3324.60 samples/sec Loss 3.4307 LearningRate 0.0185 Epoch: 11 Global Step: 141580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:24,359-Speed 3340.46 samples/sec Loss 3.4956 LearningRate 0.0185 Epoch: 11 Global Step: 141590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:27,417-Speed 3349.91 samples/sec Loss 3.3757 LearningRate 0.0185 Epoch: 11 Global Step: 141600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:30,518-Speed 3303.07 samples/sec Loss 3.4282 LearningRate 0.0185 Epoch: 11 Global Step: 141610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:33,598-Speed 3325.51 samples/sec Loss 3.4040 LearningRate 0.0185 Epoch: 11 Global Step: 141620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:36,689-Speed 3313.26 samples/sec Loss 3.4410 LearningRate 0.0185 Epoch: 11 Global Step: 141630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:59:39,826-Speed 3265.59 samples/sec Loss 3.3845 LearningRate 0.0185 Epoch: 11 Global Step: 141640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 13:59:42,987-Speed 3241.20 samples/sec Loss 3.3716 LearningRate 0.0185 Epoch: 11 Global Step: 141650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:46,058-Speed 3335.53 samples/sec Loss 3.4766 LearningRate 0.0185 Epoch: 11 Global Step: 141660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:49,240-Speed 3219.28 samples/sec Loss 3.4016 LearningRate 0.0185 Epoch: 11 Global Step: 141670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:52,332-Speed 3312.32 samples/sec Loss 3.3374 LearningRate 0.0185 Epoch: 11 Global Step: 141680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:55,390-Speed 3349.21 samples/sec Loss 3.4716 LearningRate 0.0185 Epoch: 11 Global Step: 141690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 13:59:58,468-Speed 3328.31 samples/sec Loss 3.3621 LearningRate 0.0185 Epoch: 11 Global Step: 141700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:01,572-Speed 3299.90 samples/sec Loss 3.4454 LearningRate 0.0185 Epoch: 11 Global Step: 141710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:04,661-Speed 3316.63 samples/sec Loss 3.3634 LearningRate 0.0184 Epoch: 11 Global Step: 141720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:07,721-Speed 3347.35 samples/sec Loss 3.4366 LearningRate 0.0184 Epoch: 11 Global Step: 141730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:10,773-Speed 3356.49 samples/sec Loss 3.4710 LearningRate 0.0184 Epoch: 11 Global Step: 141740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:13,927-Speed 3247.00 samples/sec Loss 3.4175 LearningRate 0.0184 Epoch: 11 Global Step: 141750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:17,083-Speed 3246.59 samples/sec Loss 3.3677 LearningRate 0.0184 Epoch: 11 Global Step: 141760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:20,240-Speed 3244.01 samples/sec Loss 3.4163 LearningRate 0.0184 Epoch: 11 Global Step: 141770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:23,302-Speed 3345.44 samples/sec Loss 3.3902 LearningRate 0.0184 Epoch: 11 Global Step: 141780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:26,412-Speed 3293.51 samples/sec Loss 3.4842 LearningRate 0.0184 Epoch: 11 Global Step: 141790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:29,528-Speed 3286.95 samples/sec Loss 3.4244 LearningRate 0.0184 Epoch: 11 Global Step: 141800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:32,608-Speed 3325.81 samples/sec Loss 3.3593 LearningRate 0.0184 Epoch: 11 Global Step: 141810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:35,709-Speed 3303.77 samples/sec Loss 3.4071 LearningRate 0.0184 Epoch: 11 Global Step: 141820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:38,784-Speed 3330.64 samples/sec Loss 3.4430 LearningRate 0.0184 Epoch: 11 Global Step: 141830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:41,844-Speed 3347.35 samples/sec Loss 3.3025 LearningRate 0.0184 Epoch: 11 Global Step: 141840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:00:44,873-Speed 3382.36 samples/sec Loss 3.4277 LearningRate 0.0184 Epoch: 11 Global Step: 141850 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:00:47,972-Speed 3305.39 samples/sec Loss 3.3127 LearningRate 0.0184 Epoch: 11 Global Step: 141860 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:00:51,093-Speed 3281.64 samples/sec Loss 3.4903 LearningRate 0.0184 Epoch: 11 Global Step: 141870 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:00:54,239-Speed 3256.04 samples/sec Loss 3.4390 LearningRate 0.0184 Epoch: 11 Global Step: 141880 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:00:57,307-Speed 3339.07 samples/sec Loss 3.3270 LearningRate 0.0184 Epoch: 11 Global Step: 141890 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:01:00,411-Speed 3299.66 samples/sec Loss 3.4901 LearningRate 0.0184 Epoch: 11 Global Step: 141900 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:01:03,592-Speed 3220.45 samples/sec Loss 3.5091 LearningRate 0.0184 Epoch: 11 Global Step: 141910 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:01:06,644-Speed 3356.46 samples/sec Loss 3.4035 LearningRate 0.0184 Epoch: 11 Global Step: 141920 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:01:09,753-Speed 3294.20 samples/sec Loss 3.3630 LearningRate 0.0184 Epoch: 11 Global Step: 141930 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:01:12,856-Speed 3301.65 samples/sec Loss 3.4004 LearningRate 0.0184 Epoch: 11 Global Step: 141940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:01:15,983-Speed 3275.37 samples/sec Loss 3.4221 LearningRate 0.0184 Epoch: 11 Global Step: 141950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:19,118-Speed 3267.37 samples/sec Loss 3.3251 LearningRate 0.0184 Epoch: 11 Global Step: 141960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:22,164-Speed 3362.97 samples/sec Loss 3.3122 LearningRate 0.0184 Epoch: 11 Global Step: 141970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:25,281-Speed 3286.13 samples/sec Loss 3.4548 LearningRate 0.0184 Epoch: 11 Global Step: 141980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:28,386-Speed 3299.29 samples/sec Loss 3.3949 LearningRate 0.0184 Epoch: 11 Global Step: 141990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:31,484-Speed 3306.14 samples/sec Loss 3.4791 LearningRate 0.0184 Epoch: 11 Global Step: 142000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:34,652-Speed 3233.32 samples/sec Loss 3.4414 LearningRate 0.0183 Epoch: 11 Global Step: 142010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:37,852-Speed 3201.62 samples/sec Loss 3.5097 LearningRate 0.0183 Epoch: 11 Global Step: 142020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:40,943-Speed 3314.22 samples/sec Loss 3.4678 LearningRate 0.0183 Epoch: 11 Global Step: 142030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:44,010-Speed 3339.82 samples/sec Loss 3.3994 LearningRate 0.0183 Epoch: 11 Global Step: 142040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:47,126-Speed 3286.86 samples/sec Loss 3.4569 LearningRate 0.0183 Epoch: 11 Global Step: 142050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:50,200-Speed 3332.04 samples/sec Loss 3.4155 LearningRate 0.0183 Epoch: 11 Global Step: 142060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:53,290-Speed 3314.67 samples/sec Loss 3.4020 LearningRate 0.0183 Epoch: 11 Global Step: 142070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:56,367-Speed 3330.24 samples/sec Loss 3.4327 LearningRate 0.0183 Epoch: 11 Global Step: 142080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:01:59,431-Speed 3342.28 samples/sec Loss 3.3997 LearningRate 0.0183 Epoch: 11 Global Step: 142090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:02:02,513-Speed 3323.47 samples/sec Loss 3.3889 LearningRate 0.0183 Epoch: 11 Global Step: 142100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:02:05,608-Speed 3310.26 samples/sec Loss 3.3316 LearningRate 0.0183 Epoch: 11 Global Step: 142110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:02:08,677-Speed 3336.88 samples/sec Loss 3.4565 LearningRate 0.0183 Epoch: 11 Global Step: 142120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:02:11,740-Speed 3344.17 samples/sec Loss 3.4385 LearningRate 0.0183 Epoch: 11 Global Step: 142130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:02:14,822-Speed 3323.75 samples/sec Loss 3.3938 LearningRate 0.0183 Epoch: 11 Global Step: 142140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:02:17,918-Speed 3309.16 samples/sec Loss 3.3667 LearningRate 0.0183 Epoch: 11 Global Step: 142150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:02:20,980-Speed 3344.83 samples/sec Loss 3.3868 LearningRate 0.0183 Epoch: 11 Global Step: 142160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:02:24,067-Speed 3319.04 samples/sec Loss 3.4388 LearningRate 0.0183 Epoch: 11 Global Step: 142170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:02:27,130-Speed 3343.24 samples/sec Loss 3.4377 LearningRate 0.0183 Epoch: 11 Global Step: 142180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:02:30,236-Speed 3297.90 samples/sec Loss 3.3340 LearningRate 0.0183 Epoch: 11 Global Step: 142190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:02:33,334-Speed 3306.64 samples/sec Loss 3.3496 LearningRate 0.0183 Epoch: 11 Global Step: 142200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:02:36,408-Speed 3331.99 samples/sec Loss 3.4518 LearningRate 0.0183 Epoch: 11 Global Step: 142210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:02:39,552-Speed 3257.75 samples/sec Loss 3.4010 LearningRate 0.0183 Epoch: 11 Global Step: 142220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:02:42,772-Speed 3181.88 samples/sec Loss 3.4062 LearningRate 0.0183 Epoch: 11 Global Step: 142230 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:02:45,834-Speed 3345.08 samples/sec Loss 3.4610 LearningRate 0.0183 Epoch: 11 Global Step: 142240 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:02:48,904-Speed 3336.22 samples/sec Loss 3.4122 LearningRate 0.0183 Epoch: 11 Global Step: 142250 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:02:52,048-Speed 3258.09 samples/sec Loss 3.4514 LearningRate 0.0183 Epoch: 11 Global Step: 142260 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:02:55,137-Speed 3316.62 samples/sec Loss 3.4621 LearningRate 0.0183 Epoch: 11 Global Step: 142270 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:02:58,217-Speed 3325.46 samples/sec Loss 3.4431 LearningRate 0.0183 Epoch: 11 Global Step: 142280 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:03:01,354-Speed 3264.88 samples/sec Loss 3.4260 LearningRate 0.0183 Epoch: 11 Global Step: 142290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:03:04,433-Speed 3326.88 samples/sec Loss 3.4895 LearningRate 0.0182 Epoch: 11 Global Step: 142300 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:03:07,504-Speed 3335.46 samples/sec Loss 3.3594 LearningRate 0.0182 Epoch: 11 Global Step: 142310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:10,566-Speed 3345.26 samples/sec Loss 3.4406 LearningRate 0.0182 Epoch: 11 Global Step: 142320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:13,676-Speed 3292.98 samples/sec Loss 3.5123 LearningRate 0.0182 Epoch: 11 Global Step: 142330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:16,763-Speed 3318.53 samples/sec Loss 3.4548 LearningRate 0.0182 Epoch: 11 Global Step: 142340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:19,901-Speed 3264.83 samples/sec Loss 3.4174 LearningRate 0.0182 Epoch: 11 Global Step: 142350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:22,993-Speed 3312.71 samples/sec Loss 3.4671 LearningRate 0.0182 Epoch: 11 Global Step: 142360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:26,110-Speed 3286.60 samples/sec Loss 3.5378 LearningRate 0.0182 Epoch: 11 Global Step: 142370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:29,191-Speed 3324.60 samples/sec Loss 3.4702 LearningRate 0.0182 Epoch: 11 Global Step: 142380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:32,306-Speed 3288.39 samples/sec Loss 3.3447 LearningRate 0.0182 Epoch: 11 Global Step: 142390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:35,464-Speed 3243.42 samples/sec Loss 3.4405 LearningRate 0.0182 Epoch: 11 Global Step: 142400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:38,554-Speed 3315.13 samples/sec Loss 3.4272 LearningRate 0.0182 Epoch: 11 Global Step: 142410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:03:41,638-Speed 3321.64 samples/sec Loss 3.4160 LearningRate 0.0182 Epoch: 11 Global Step: 142420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:44,698-Speed 3347.12 samples/sec Loss 3.4492 LearningRate 0.0182 Epoch: 11 Global Step: 142430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:47,821-Speed 3279.56 samples/sec Loss 3.3662 LearningRate 0.0182 Epoch: 11 Global Step: 142440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:50,982-Speed 3240.48 samples/sec Loss 3.4650 LearningRate 0.0182 Epoch: 11 Global Step: 142450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:54,128-Speed 3255.84 samples/sec Loss 3.4438 LearningRate 0.0182 Epoch: 11 Global Step: 142460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:03:57,202-Speed 3332.80 samples/sec Loss 3.4654 LearningRate 0.0182 Epoch: 11 Global Step: 142470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:00,307-Speed 3298.64 samples/sec Loss 3.3656 LearningRate 0.0182 Epoch: 11 Global Step: 142480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:03,377-Speed 3336.75 samples/sec Loss 3.3526 LearningRate 0.0182 Epoch: 11 Global Step: 142490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:06,464-Speed 3317.76 samples/sec Loss 3.4210 LearningRate 0.0182 Epoch: 11 Global Step: 142500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:09,516-Speed 3356.81 samples/sec Loss 3.5067 LearningRate 0.0182 Epoch: 11 Global Step: 142510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:12,628-Speed 3291.54 samples/sec Loss 3.4801 LearningRate 0.0182 Epoch: 11 Global Step: 142520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:15,748-Speed 3282.85 samples/sec Loss 3.4719 LearningRate 0.0182 Epoch: 11 Global Step: 142530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:18,829-Speed 3324.45 samples/sec Loss 3.4175 LearningRate 0.0182 Epoch: 11 Global Step: 142540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:21,891-Speed 3345.83 samples/sec Loss 3.4227 LearningRate 0.0182 Epoch: 11 Global Step: 142550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:25,000-Speed 3295.11 samples/sec Loss 3.4201 LearningRate 0.0182 Epoch: 11 Global Step: 142560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:28,191-Speed 3209.36 samples/sec Loss 3.4176 LearningRate 0.0182 Epoch: 11 Global Step: 142570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:31,264-Speed 3333.41 samples/sec Loss 3.3642 LearningRate 0.0182 Epoch: 11 Global Step: 142580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:34,338-Speed 3332.02 samples/sec Loss 3.4657 LearningRate 0.0181 Epoch: 11 Global Step: 142590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:37,423-Speed 3320.81 samples/sec Loss 3.4526 LearningRate 0.0181 Epoch: 11 Global Step: 142600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:40,484-Speed 3346.46 samples/sec Loss 3.4574 LearningRate 0.0181 Epoch: 11 Global Step: 142610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:43,574-Speed 3315.14 samples/sec Loss 3.4010 LearningRate 0.0181 Epoch: 11 Global Step: 142620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:46,633-Speed 3348.58 samples/sec Loss 3.4494 LearningRate 0.0181 Epoch: 11 Global Step: 142630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:49,773-Speed 3263.00 samples/sec Loss 3.4100 LearningRate 0.0181 Epoch: 11 Global Step: 142640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:52,951-Speed 3222.91 samples/sec Loss 3.4512 LearningRate 0.0181 Epoch: 11 Global Step: 142650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:56,018-Speed 3340.02 samples/sec Loss 3.4470 LearningRate 0.0181 Epoch: 11 Global Step: 142660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:04:59,140-Speed 3280.44 samples/sec Loss 3.4671 LearningRate 0.0181 Epoch: 11 Global Step: 142670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:02,279-Speed 3263.40 samples/sec Loss 3.4126 LearningRate 0.0181 Epoch: 11 Global Step: 142680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:05,412-Speed 3270.39 samples/sec Loss 3.4115 LearningRate 0.0181 Epoch: 11 Global Step: 142690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:08,516-Speed 3300.56 samples/sec Loss 3.4553 LearningRate 0.0181 Epoch: 11 Global Step: 142700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:11,575-Speed 3348.11 samples/sec Loss 3.4514 LearningRate 0.0181 Epoch: 11 Global Step: 142710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:14,703-Speed 3275.09 samples/sec Loss 3.4558 LearningRate 0.0181 Epoch: 11 Global Step: 142720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:05:17,814-Speed 3292.70 samples/sec Loss 3.5082 LearningRate 0.0181 Epoch: 11 Global Step: 142730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:20,879-Speed 3341.80 samples/sec Loss 3.3899 LearningRate 0.0181 Epoch: 11 Global Step: 142740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:23,952-Speed 3332.66 samples/sec Loss 3.5035 LearningRate 0.0181 Epoch: 11 Global Step: 142750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:27,113-Speed 3241.07 samples/sec Loss 3.4264 LearningRate 0.0181 Epoch: 11 Global Step: 142760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:30,187-Speed 3332.05 samples/sec Loss 3.4435 LearningRate 0.0181 Epoch: 11 Global Step: 142770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:33,310-Speed 3279.59 samples/sec Loss 3.5174 LearningRate 0.0181 Epoch: 11 Global Step: 142780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:36,467-Speed 3245.38 samples/sec Loss 3.4739 LearningRate 0.0181 Epoch: 11 Global Step: 142790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:39,588-Speed 3282.34 samples/sec Loss 3.4570 LearningRate 0.0181 Epoch: 11 Global Step: 142800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:42,763-Speed 3225.96 samples/sec Loss 3.4070 LearningRate 0.0181 Epoch: 11 Global Step: 142810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:45,840-Speed 3328.24 samples/sec Loss 3.4903 LearningRate 0.0181 Epoch: 11 Global Step: 142820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:05:48,931-Speed 3314.30 samples/sec Loss 3.4987 LearningRate 0.0181 Epoch: 11 Global Step: 142830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:05:52,103-Speed 3229.19 samples/sec Loss 3.5235 LearningRate 0.0181 Epoch: 11 Global Step: 142840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:05:55,186-Speed 3323.25 samples/sec Loss 3.4300 LearningRate 0.0181 Epoch: 11 Global Step: 142850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:05:58,246-Speed 3347.48 samples/sec Loss 3.4352 LearningRate 0.0181 Epoch: 11 Global Step: 142860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:06:01,336-Speed 3315.23 samples/sec Loss 3.3854 LearningRate 0.0181 Epoch: 11 Global Step: 142870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:06:04,495-Speed 3242.14 samples/sec Loss 3.5052 LearningRate 0.0180 Epoch: 11 Global Step: 142880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:06:07,619-Speed 3278.69 samples/sec Loss 3.4618 LearningRate 0.0180 Epoch: 11 Global Step: 142890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:06:10,689-Speed 3337.19 samples/sec Loss 3.4476 LearningRate 0.0180 Epoch: 11 Global Step: 142900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:06:13,771-Speed 3324.05 samples/sec Loss 3.4994 LearningRate 0.0180 Epoch: 11 Global Step: 142910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:06:16,831-Speed 3347.19 samples/sec Loss 3.4775 LearningRate 0.0180 Epoch: 11 Global Step: 142920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:06:19,890-Speed 3349.04 samples/sec Loss 3.4393 LearningRate 0.0180 Epoch: 11 Global Step: 142930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:06:22,938-Speed 3359.81 samples/sec Loss 3.4634 LearningRate 0.0180 Epoch: 11 Global Step: 142940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:06:26,024-Speed 3319.99 samples/sec Loss 3.4547 LearningRate 0.0180 Epoch: 11 Global Step: 142950 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:06:29,076-Speed 3356.47 samples/sec Loss 3.4581 LearningRate 0.0180 Epoch: 11 Global Step: 142960 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:06:32,142-Speed 3340.73 samples/sec Loss 3.5025 LearningRate 0.0180 Epoch: 11 Global Step: 142970 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:06:35,211-Speed 3337.75 samples/sec Loss 3.4183 LearningRate 0.0180 Epoch: 11 Global Step: 142980 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:06:38,289-Speed 3327.26 samples/sec Loss 3.5031 LearningRate 0.0180 Epoch: 11 Global Step: 142990 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:06:41,402-Speed 3290.63 samples/sec Loss 3.4420 LearningRate 0.0180 Epoch: 11 Global Step: 143000 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:06:44,554-Speed 3249.40 samples/sec Loss 3.4361 LearningRate 0.0180 Epoch: 11 Global Step: 143010 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:06:47,754-Speed 3200.98 samples/sec Loss 3.4825 LearningRate 0.0180 Epoch: 11 Global Step: 143020 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:06:50,887-Speed 3270.21 samples/sec Loss 3.4884 LearningRate 0.0180 Epoch: 11 Global Step: 143030 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:06:54,049-Speed 3238.56 samples/sec Loss 3.4312 LearningRate 0.0180 Epoch: 11 Global Step: 143040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:06:57,157-Speed 3296.95 samples/sec Loss 3.4655 LearningRate 0.0180 Epoch: 11 Global Step: 143050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:07:00,248-Speed 3313.03 samples/sec Loss 3.3880 LearningRate 0.0180 Epoch: 11 Global Step: 143060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:07:03,323-Speed 3331.52 samples/sec Loss 3.4489 LearningRate 0.0180 Epoch: 11 Global Step: 143070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:07:06,434-Speed 3293.08 samples/sec Loss 3.4935 LearningRate 0.0180 Epoch: 11 Global Step: 143080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:07:09,489-Speed 3352.74 samples/sec Loss 3.4103 LearningRate 0.0180 Epoch: 11 Global Step: 143090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:07:12,591-Speed 3302.41 samples/sec Loss 3.5001 LearningRate 0.0180 Epoch: 11 Global Step: 143100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:07:15,713-Speed 3280.53 samples/sec Loss 3.4101 LearningRate 0.0180 Epoch: 11 Global Step: 143110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:07:18,835-Speed 3280.89 samples/sec Loss 3.4304 LearningRate 0.0180 Epoch: 11 Global Step: 143120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:07:21,900-Speed 3342.75 samples/sec Loss 3.5107 LearningRate 0.0180 Epoch: 11 Global Step: 143130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:07:24,974-Speed 3332.21 samples/sec Loss 3.4440 LearningRate 0.0180 Epoch: 11 Global Step: 143140 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:07:28,198-Speed 3176.69 samples/sec Loss 3.3969 LearningRate 0.0180 Epoch: 11 Global Step: 143150 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:07:31,284-Speed 3319.64 samples/sec Loss 3.4587 LearningRate 0.0180 Epoch: 11 Global Step: 143160 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:07:34,449-Speed 3236.91 samples/sec Loss 3.5175 LearningRate 0.0180 Epoch: 11 Global Step: 143170 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:07:37,526-Speed 3328.54 samples/sec Loss 3.4211 LearningRate 0.0179 Epoch: 11 Global Step: 143180 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:07:40,577-Speed 3357.38 samples/sec Loss 3.4049 LearningRate 0.0179 Epoch: 11 Global Step: 143190 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:07:43,630-Speed 3355.25 samples/sec Loss 3.4213 LearningRate 0.0179 Epoch: 11 Global Step: 143200 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:07:46,684-Speed 3354.26 samples/sec Loss 3.5014 LearningRate 0.0179 Epoch: 11 Global Step: 143210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:07:49,828-Speed 3257.72 samples/sec Loss 3.5074 LearningRate 0.0179 Epoch: 11 Global Step: 143220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:07:52,963-Speed 3267.94 samples/sec Loss 3.3323 LearningRate 0.0179 Epoch: 11 Global Step: 143230 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:07:56,024-Speed 3345.68 samples/sec Loss 3.3646 LearningRate 0.0179 Epoch: 11 Global Step: 143240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:07:59,105-Speed 3325.11 samples/sec Loss 3.3622 LearningRate 0.0179 Epoch: 11 Global Step: 143250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:02,169-Speed 3343.09 samples/sec Loss 3.5385 LearningRate 0.0179 Epoch: 11 Global Step: 143260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:05,257-Speed 3317.99 samples/sec Loss 3.4318 LearningRate 0.0179 Epoch: 11 Global Step: 143270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:08,352-Speed 3309.43 samples/sec Loss 3.5472 LearningRate 0.0179 Epoch: 11 Global Step: 143280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:11,452-Speed 3304.02 samples/sec Loss 3.3749 LearningRate 0.0179 Epoch: 11 Global Step: 143290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:14,537-Speed 3319.92 samples/sec Loss 3.3797 LearningRate 0.0179 Epoch: 11 Global Step: 143300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:17,593-Speed 3351.78 samples/sec Loss 3.4852 LearningRate 0.0179 Epoch: 11 Global Step: 143310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:20,653-Speed 3348.22 samples/sec Loss 3.5390 LearningRate 0.0179 Epoch: 11 Global Step: 143320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:23,734-Speed 3323.70 samples/sec Loss 3.4989 LearningRate 0.0179 Epoch: 11 Global Step: 143330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:26,910-Speed 3225.98 samples/sec Loss 3.4271 LearningRate 0.0179 Epoch: 11 Global Step: 143340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:08:29,966-Speed 3351.79 samples/sec Loss 3.5140 LearningRate 0.0179 Epoch: 11 Global Step: 143350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:33,049-Speed 3322.56 samples/sec Loss 3.4713 LearningRate 0.0179 Epoch: 11 Global Step: 143360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:36,181-Speed 3270.26 samples/sec Loss 3.4569 LearningRate 0.0179 Epoch: 11 Global Step: 143370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:39,333-Speed 3249.93 samples/sec Loss 3.4578 LearningRate 0.0179 Epoch: 11 Global Step: 143380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:42,487-Speed 3247.77 samples/sec Loss 3.3535 LearningRate 0.0179 Epoch: 11 Global Step: 143390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:45,537-Speed 3358.39 samples/sec Loss 3.4335 LearningRate 0.0179 Epoch: 11 Global Step: 143400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:48,613-Speed 3330.16 samples/sec Loss 3.4297 LearningRate 0.0179 Epoch: 11 Global Step: 143410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:51,711-Speed 3306.40 samples/sec Loss 3.4135 LearningRate 0.0179 Epoch: 11 Global Step: 143420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:54,825-Speed 3288.20 samples/sec Loss 3.4493 LearningRate 0.0179 Epoch: 11 Global Step: 143430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:08:57,916-Speed 3314.82 samples/sec Loss 3.4363 LearningRate 0.0179 Epoch: 11 Global Step: 143440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:09:00,966-Speed 3358.13 samples/sec Loss 3.4273 LearningRate 0.0179 Epoch: 11 Global Step: 143450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:09:04,086-Speed 3283.33 samples/sec Loss 3.4579 LearningRate 0.0179 Epoch: 11 Global Step: 143460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:09:07,193-Speed 3296.77 samples/sec Loss 3.4714 LearningRate 0.0178 Epoch: 11 Global Step: 143470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:09:10,277-Speed 3321.45 samples/sec Loss 3.5203 LearningRate 0.0178 Epoch: 11 Global Step: 143480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:09:13,382-Speed 3298.93 samples/sec Loss 3.5463 LearningRate 0.0178 Epoch: 11 Global Step: 143490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:09:16,467-Speed 3320.43 samples/sec Loss 3.5075 LearningRate 0.0178 Epoch: 11 Global Step: 143500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:09:19,548-Speed 3324.95 samples/sec Loss 3.4199 LearningRate 0.0178 Epoch: 11 Global Step: 143510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:09:22,627-Speed 3326.87 samples/sec Loss 3.4578 LearningRate 0.0178 Epoch: 11 Global Step: 143520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:09:25,764-Speed 3264.04 samples/sec Loss 3.5137 LearningRate 0.0178 Epoch: 11 Global Step: 143530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:09:28,900-Speed 3267.41 samples/sec Loss 3.4710 LearningRate 0.0178 Epoch: 11 Global Step: 143540 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:09:31,975-Speed 3330.94 samples/sec Loss 3.4549 LearningRate 0.0178 Epoch: 11 Global Step: 143550 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:09:35,038-Speed 3343.60 samples/sec Loss 3.4528 LearningRate 0.0178 Epoch: 11 Global Step: 143560 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:09:38,223-Speed 3216.57 samples/sec Loss 3.4422 LearningRate 0.0178 Epoch: 11 Global Step: 143570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:09:41,290-Speed 3340.18 samples/sec Loss 3.5007 LearningRate 0.0178 Epoch: 11 Global Step: 143580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:09:44,391-Speed 3302.69 samples/sec Loss 3.4467 LearningRate 0.0178 Epoch: 11 Global Step: 143590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:09:47,531-Speed 3262.30 samples/sec Loss 3.4512 LearningRate 0.0178 Epoch: 11 Global Step: 143600 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:09:50,674-Speed 3258.76 samples/sec Loss 3.4794 LearningRate 0.0178 Epoch: 11 Global Step: 143610 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:09:53,767-Speed 3312.57 samples/sec Loss 3.4419 LearningRate 0.0178 Epoch: 11 Global Step: 143620 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:09:56,826-Speed 3348.67 samples/sec Loss 3.4234 LearningRate 0.0178 Epoch: 11 Global Step: 143630 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:09:59,966-Speed 3262.38 samples/sec Loss 3.4987 LearningRate 0.0178 Epoch: 11 Global Step: 143640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:10:03,052-Speed 3318.63 samples/sec Loss 3.4258 LearningRate 0.0178 Epoch: 11 Global Step: 143650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:10:06,168-Speed 3287.75 samples/sec Loss 3.5247 LearningRate 0.0178 Epoch: 11 Global Step: 143660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:10:09,209-Speed 3367.92 samples/sec Loss 3.4005 LearningRate 0.0178 Epoch: 11 Global Step: 143670 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:12,361-Speed 3249.60 samples/sec Loss 3.4498 LearningRate 0.0178 Epoch: 11 Global Step: 143680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:15,535-Speed 3227.66 samples/sec Loss 3.5470 LearningRate 0.0178 Epoch: 11 Global Step: 143690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:18,689-Speed 3247.64 samples/sec Loss 3.4504 LearningRate 0.0178 Epoch: 11 Global Step: 143700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:21,790-Speed 3303.91 samples/sec Loss 3.4787 LearningRate 0.0178 Epoch: 11 Global Step: 143710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:24,863-Speed 3332.77 samples/sec Loss 3.4053 LearningRate 0.0178 Epoch: 11 Global Step: 143720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:27,955-Speed 3312.36 samples/sec Loss 3.4838 LearningRate 0.0178 Epoch: 11 Global Step: 143730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:31,073-Speed 3285.64 samples/sec Loss 3.4837 LearningRate 0.0178 Epoch: 11 Global Step: 143740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:34,141-Speed 3339.07 samples/sec Loss 3.4922 LearningRate 0.0178 Epoch: 11 Global Step: 143750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:37,272-Speed 3271.64 samples/sec Loss 3.5074 LearningRate 0.0177 Epoch: 11 Global Step: 143760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:40,387-Speed 3288.19 samples/sec Loss 3.5001 LearningRate 0.0177 Epoch: 11 Global Step: 143770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:10:43,486-Speed 3304.82 samples/sec Loss 3.4908 LearningRate 0.0177 Epoch: 11 Global Step: 143780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:10:46,612-Speed 3277.55 samples/sec Loss 3.5066 LearningRate 0.0177 Epoch: 11 Global Step: 143790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:10:49,714-Speed 3301.61 samples/sec Loss 3.3880 LearningRate 0.0177 Epoch: 11 Global Step: 143800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:10:52,806-Speed 3312.82 samples/sec Loss 3.4818 LearningRate 0.0177 Epoch: 11 Global Step: 143810 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:55,901-Speed 3309.86 samples/sec Loss 3.5066 LearningRate 0.0177 Epoch: 11 Global Step: 143820 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:10:58,969-Speed 3338.91 samples/sec Loss 3.4598 LearningRate 0.0177 Epoch: 11 Global Step: 143830 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:02,029-Speed 3347.52 samples/sec Loss 3.4375 LearningRate 0.0177 Epoch: 11 Global Step: 143840 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:05,158-Speed 3273.36 samples/sec Loss 3.4793 LearningRate 0.0177 Epoch: 11 Global Step: 143850 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:08,287-Speed 3274.17 samples/sec Loss 3.5357 LearningRate 0.0177 Epoch: 11 Global Step: 143860 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:11,347-Speed 3347.28 samples/sec Loss 3.4692 LearningRate 0.0177 Epoch: 11 Global Step: 143870 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:14,416-Speed 3336.76 samples/sec Loss 3.4104 LearningRate 0.0177 Epoch: 11 Global Step: 143880 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:17,472-Speed 3352.41 samples/sec Loss 3.5173 LearningRate 0.0177 Epoch: 11 Global Step: 143890 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:20,542-Speed 3336.76 samples/sec Loss 3.4711 LearningRate 0.0177 Epoch: 11 Global Step: 143900 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:23,643-Speed 3302.46 samples/sec Loss 3.4396 LearningRate 0.0177 Epoch: 11 Global Step: 143910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:11:26,850-Speed 3194.00 samples/sec Loss 3.4712 LearningRate 0.0177 Epoch: 11 Global Step: 143920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:11:29,909-Speed 3348.74 samples/sec Loss 3.5173 LearningRate 0.0177 Epoch: 11 Global Step: 143930 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:32,968-Speed 3349.42 samples/sec Loss 3.4461 LearningRate 0.0177 Epoch: 11 Global Step: 143940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:36,072-Speed 3299.89 samples/sec Loss 3.5535 LearningRate 0.0177 Epoch: 11 Global Step: 143950 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:39,123-Speed 3357.25 samples/sec Loss 3.4125 LearningRate 0.0177 Epoch: 11 Global Step: 143960 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:42,259-Speed 3266.16 samples/sec Loss 3.3872 LearningRate 0.0177 Epoch: 11 Global Step: 143970 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:45,315-Speed 3352.38 samples/sec Loss 3.4914 LearningRate 0.0177 Epoch: 11 Global Step: 143980 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:48,404-Speed 3315.44 samples/sec Loss 3.5379 LearningRate 0.0177 Epoch: 11 Global Step: 143990 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:51,606-Speed 3199.31 samples/sec Loss 3.4748 LearningRate 0.0177 Epoch: 11 Global Step: 144000 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:54,696-Speed 3315.50 samples/sec Loss 3.4771 LearningRate 0.0177 Epoch: 11 Global Step: 144010 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:11:57,762-Speed 3340.21 samples/sec Loss 3.4812 LearningRate 0.0177 Epoch: 11 Global Step: 144020 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:12:00,835-Speed 3333.45 samples/sec Loss 3.4570 LearningRate 0.0177 Epoch: 11 Global Step: 144030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:03,906-Speed 3335.91 samples/sec Loss 3.5094 LearningRate 0.0177 Epoch: 11 Global Step: 144040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:06,989-Speed 3323.01 samples/sec Loss 3.6417 LearningRate 0.0177 Epoch: 11 Global Step: 144050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:10,043-Speed 3353.49 samples/sec Loss 3.5270 LearningRate 0.0176 Epoch: 11 Global Step: 144060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:13,131-Speed 3317.15 samples/sec Loss 3.5410 LearningRate 0.0176 Epoch: 11 Global Step: 144070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:16,224-Speed 3311.55 samples/sec Loss 3.4562 LearningRate 0.0176 Epoch: 11 Global Step: 144080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:19,337-Speed 3291.02 samples/sec Loss 3.5210 LearningRate 0.0176 Epoch: 11 Global Step: 144090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:22,412-Speed 3331.22 samples/sec Loss 3.5244 LearningRate 0.0176 Epoch: 11 Global Step: 144100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:25,533-Speed 3282.33 samples/sec Loss 3.4875 LearningRate 0.0176 Epoch: 11 Global Step: 144110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:28,598-Speed 3341.96 samples/sec Loss 3.5589 LearningRate 0.0176 Epoch: 11 Global Step: 144120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:31,675-Speed 3328.81 samples/sec Loss 3.4575 LearningRate 0.0176 Epoch: 11 Global Step: 144130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:12:34,716-Speed 3368.54 samples/sec Loss 3.4641 LearningRate 0.0176 Epoch: 11 Global Step: 144140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:37,848-Speed 3271.02 samples/sec Loss 3.5017 LearningRate 0.0176 Epoch: 11 Global Step: 144150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:41,071-Speed 3178.01 samples/sec Loss 3.4352 LearningRate 0.0176 Epoch: 11 Global Step: 144160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:44,182-Speed 3292.66 samples/sec Loss 3.4580 LearningRate 0.0176 Epoch: 11 Global Step: 144170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:47,256-Speed 3332.36 samples/sec Loss 3.4950 LearningRate 0.0176 Epoch: 11 Global Step: 144180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:50,330-Speed 3332.07 samples/sec Loss 3.4321 LearningRate 0.0176 Epoch: 11 Global Step: 144190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:53,470-Speed 3262.49 samples/sec Loss 3.5473 LearningRate 0.0176 Epoch: 11 Global Step: 144200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:56,529-Speed 3348.21 samples/sec Loss 3.4690 LearningRate 0.0176 Epoch: 11 Global Step: 144210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:12:59,631-Speed 3302.38 samples/sec Loss 3.4046 LearningRate 0.0176 Epoch: 11 Global Step: 144220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:13:02,788-Speed 3244.80 samples/sec Loss 3.4992 LearningRate 0.0176 Epoch: 11 Global Step: 144230 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:05,883-Speed 3309.02 samples/sec Loss 3.4972 LearningRate 0.0176 Epoch: 11 Global Step: 144240 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:08,978-Speed 3309.81 samples/sec Loss 3.3693 LearningRate 0.0176 Epoch: 11 Global Step: 144250 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:12,188-Speed 3191.09 samples/sec Loss 3.4357 LearningRate 0.0176 Epoch: 11 Global Step: 144260 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:15,319-Speed 3272.17 samples/sec Loss 3.4234 LearningRate 0.0176 Epoch: 11 Global Step: 144270 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:18,395-Speed 3329.33 samples/sec Loss 3.4992 LearningRate 0.0176 Epoch: 11 Global Step: 144280 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:21,452-Speed 3351.39 samples/sec Loss 3.4662 LearningRate 0.0176 Epoch: 11 Global Step: 144290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:24,566-Speed 3289.15 samples/sec Loss 3.4313 LearningRate 0.0176 Epoch: 11 Global Step: 144300 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:27,678-Speed 3292.09 samples/sec Loss 3.5409 LearningRate 0.0176 Epoch: 11 Global Step: 144310 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:30,841-Speed 3238.63 samples/sec Loss 3.4527 LearningRate 0.0176 Epoch: 11 Global Step: 144320 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:33,910-Speed 3337.54 samples/sec Loss 3.4776 LearningRate 0.0176 Epoch: 11 Global Step: 144330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:13:37,056-Speed 3256.20 samples/sec Loss 3.4112 LearningRate 0.0176 Epoch: 11 Global Step: 144340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:13:40,198-Speed 3259.58 samples/sec Loss 3.5649 LearningRate 0.0176 Epoch: 11 Global Step: 144350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:13:43,356-Speed 3244.24 samples/sec Loss 3.4942 LearningRate 0.0175 Epoch: 11 Global Step: 144360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:13:46,409-Speed 3354.45 samples/sec Loss 3.4650 LearningRate 0.0175 Epoch: 11 Global Step: 144370 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:49,467-Speed 3350.29 samples/sec Loss 3.4012 LearningRate 0.0175 Epoch: 11 Global Step: 144380 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:52,553-Speed 3319.51 samples/sec Loss 3.4591 LearningRate 0.0175 Epoch: 11 Global Step: 144390 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:55,631-Speed 3328.02 samples/sec Loss 3.5062 LearningRate 0.0175 Epoch: 11 Global Step: 144400 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:13:58,703-Speed 3333.55 samples/sec Loss 3.4464 LearningRate 0.0175 Epoch: 11 Global Step: 144410 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:01,799-Speed 3309.16 samples/sec Loss 3.5826 LearningRate 0.0175 Epoch: 11 Global Step: 144420 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:04,886-Speed 3318.71 samples/sec Loss 3.6252 LearningRate 0.0175 Epoch: 11 Global Step: 144430 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:08,015-Speed 3273.66 samples/sec Loss 3.5049 LearningRate 0.0175 Epoch: 11 Global Step: 144440 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:11,074-Speed 3348.74 samples/sec Loss 3.5815 LearningRate 0.0175 Epoch: 11 Global Step: 144450 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:14,159-Speed 3320.14 samples/sec Loss 3.6119 LearningRate 0.0175 Epoch: 11 Global Step: 144460 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:17,323-Speed 3237.73 samples/sec Loss 3.5000 LearningRate 0.0175 Epoch: 11 Global Step: 144470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:14:20,443-Speed 3282.83 samples/sec Loss 3.5409 LearningRate 0.0175 Epoch: 11 Global Step: 144480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:14:23,611-Speed 3233.88 samples/sec Loss 3.4509 LearningRate 0.0175 Epoch: 11 Global Step: 144490 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:26,795-Speed 3216.88 samples/sec Loss 3.4963 LearningRate 0.0175 Epoch: 11 Global Step: 144500 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:29,866-Speed 3334.97 samples/sec Loss 3.5666 LearningRate 0.0175 Epoch: 11 Global Step: 144510 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:32,936-Speed 3337.03 samples/sec Loss 3.5257 LearningRate 0.0175 Epoch: 11 Global Step: 144520 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:36,065-Speed 3273.62 samples/sec Loss 3.5295 LearningRate 0.0175 Epoch: 11 Global Step: 144530 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:39,124-Speed 3348.62 samples/sec Loss 3.5363 LearningRate 0.0175 Epoch: 11 Global Step: 144540 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:42,276-Speed 3250.18 samples/sec Loss 3.5464 LearningRate 0.0175 Epoch: 11 Global Step: 144550 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:45,340-Speed 3343.22 samples/sec Loss 3.5231 LearningRate 0.0175 Epoch: 11 Global Step: 144560 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:48,496-Speed 3244.71 samples/sec Loss 3.5556 LearningRate 0.0175 Epoch: 11 Global Step: 144570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:51,664-Speed 3234.04 samples/sec Loss 3.4248 LearningRate 0.0175 Epoch: 11 Global Step: 144580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:14:54,799-Speed 3267.09 samples/sec Loss 3.5185 LearningRate 0.0175 Epoch: 11 Global Step: 144590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:14:57,858-Speed 3349.00 samples/sec Loss 3.4765 LearningRate 0.0175 Epoch: 11 Global Step: 144600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:15:00,909-Speed 3357.43 samples/sec Loss 3.4884 LearningRate 0.0175 Epoch: 11 Global Step: 144610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:15:03,984-Speed 3330.36 samples/sec Loss 3.5467 LearningRate 0.0175 Epoch: 11 Global Step: 144620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:15:07,060-Speed 3330.25 samples/sec Loss 3.4460 LearningRate 0.0175 Epoch: 11 Global Step: 144630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:15:10,163-Speed 3301.20 samples/sec Loss 3.5051 LearningRate 0.0175 Epoch: 11 Global Step: 144640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:15:13,274-Speed 3292.38 samples/sec Loss 3.4999 LearningRate 0.0174 Epoch: 11 Global Step: 144650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:15:16,417-Speed 3259.91 samples/sec Loss 3.4928 LearningRate 0.0174 Epoch: 11 Global Step: 144660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:15:19,540-Speed 3279.92 samples/sec Loss 3.4862 LearningRate 0.0174 Epoch: 11 Global Step: 144670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:15:22,610-Speed 3336.53 samples/sec Loss 3.4411 LearningRate 0.0174 Epoch: 11 Global Step: 144680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:15:25,737-Speed 3275.33 samples/sec Loss 3.5118 LearningRate 0.0174 Epoch: 11 Global Step: 144690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:15:28,810-Speed 3333.56 samples/sec Loss 3.4867 LearningRate 0.0174 Epoch: 11 Global Step: 144700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:15:31,866-Speed 3351.73 samples/sec Loss 3.4933 LearningRate 0.0174 Epoch: 11 Global Step: 144710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:15:35,014-Speed 3253.61 samples/sec Loss 3.4995 LearningRate 0.0174 Epoch: 11 Global Step: 144720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:15:38,166-Speed 3250.37 samples/sec Loss 3.4625 LearningRate 0.0174 Epoch: 11 Global Step: 144730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:15:41,257-Speed 3313.68 samples/sec Loss 3.5980 LearningRate 0.0174 Epoch: 11 Global Step: 144740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:15:44,388-Speed 3271.70 samples/sec Loss 3.4495 LearningRate 0.0174 Epoch: 11 Global Step: 144750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:15:47,538-Speed 3251.94 samples/sec Loss 3.4899 LearningRate 0.0174 Epoch: 11 Global Step: 144760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:15:50,643-Speed 3299.48 samples/sec Loss 3.4252 LearningRate 0.0174 Epoch: 11 Global Step: 144770 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:15:53,715-Speed 3333.26 samples/sec Loss 3.4640 LearningRate 0.0174 Epoch: 11 Global Step: 144780 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:15:56,865-Speed 3253.06 samples/sec Loss 3.4870 LearningRate 0.0174 Epoch: 11 Global Step: 144790 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:15:59,940-Speed 3330.25 samples/sec Loss 3.5096 LearningRate 0.0174 Epoch: 11 Global Step: 144800 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:03,167-Speed 3175.26 samples/sec Loss 3.5442 LearningRate 0.0174 Epoch: 11 Global Step: 144810 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:06,238-Speed 3335.31 samples/sec Loss 3.5264 LearningRate 0.0174 Epoch: 11 Global Step: 144820 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:09,311-Speed 3333.01 samples/sec Loss 3.4718 LearningRate 0.0174 Epoch: 11 Global Step: 144830 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:12,398-Speed 3318.62 samples/sec Loss 3.5521 LearningRate 0.0174 Epoch: 11 Global Step: 144840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:16:15,465-Speed 3339.52 samples/sec Loss 3.4907 LearningRate 0.0174 Epoch: 11 Global Step: 144850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:16:18,585-Speed 3282.59 samples/sec Loss 3.4090 LearningRate 0.0174 Epoch: 11 Global Step: 144860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:16:21,642-Speed 3351.07 samples/sec Loss 3.5299 LearningRate 0.0174 Epoch: 11 Global Step: 144870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:16:24,776-Speed 3268.80 samples/sec Loss 3.5433 LearningRate 0.0174 Epoch: 11 Global Step: 144880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:16:27,933-Speed 3243.96 samples/sec Loss 3.5624 LearningRate 0.0174 Epoch: 11 Global Step: 144890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:16:31,053-Speed 3283.71 samples/sec Loss 3.4852 LearningRate 0.0174 Epoch: 11 Global Step: 144900 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:34,133-Speed 3325.43 samples/sec Loss 3.4560 LearningRate 0.0174 Epoch: 11 Global Step: 144910 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:37,236-Speed 3301.02 samples/sec Loss 3.4783 LearningRate 0.0174 Epoch: 11 Global Step: 144920 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:40,353-Speed 3286.84 samples/sec Loss 3.4694 LearningRate 0.0174 Epoch: 11 Global Step: 144930 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:43,477-Speed 3278.03 samples/sec Loss 3.4754 LearningRate 0.0174 Epoch: 11 Global Step: 144940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:46,562-Speed 3321.31 samples/sec Loss 3.4901 LearningRate 0.0173 Epoch: 11 Global Step: 144950 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:49,673-Speed 3292.56 samples/sec Loss 3.4867 LearningRate 0.0173 Epoch: 11 Global Step: 144960 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:52,790-Speed 3285.06 samples/sec Loss 3.5273 LearningRate 0.0173 Epoch: 11 Global Step: 144970 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:55,901-Speed 3293.40 samples/sec Loss 3.4415 LearningRate 0.0173 Epoch: 11 Global Step: 144980 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:16:58,986-Speed 3320.33 samples/sec Loss 3.4795 LearningRate 0.0173 Epoch: 11 Global Step: 144990 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:17:02,172-Speed 3214.91 samples/sec Loss 3.5497 LearningRate 0.0173 Epoch: 11 Global Step: 145000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:17:05,294-Speed 3281.75 samples/sec Loss 3.4938 LearningRate 0.0173 Epoch: 11 Global Step: 145010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:17:08,380-Speed 3319.47 samples/sec Loss 3.5146 LearningRate 0.0173 Epoch: 11 Global Step: 145020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:17:11,473-Speed 3311.77 samples/sec Loss 3.5200 LearningRate 0.0173 Epoch: 11 Global Step: 145030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:17:14,594-Speed 3281.85 samples/sec Loss 3.5009 LearningRate 0.0173 Epoch: 11 Global Step: 145040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:17:17,785-Speed 3210.40 samples/sec Loss 3.5196 LearningRate 0.0173 Epoch: 11 Global Step: 145050 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:17:20,877-Speed 3312.47 samples/sec Loss 3.4813 LearningRate 0.0173 Epoch: 11 Global Step: 145060 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:17:23,994-Speed 3286.50 samples/sec Loss 3.4533 LearningRate 0.0173 Epoch: 11 Global Step: 145070 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:17:27,111-Speed 3286.44 samples/sec Loss 3.6106 LearningRate 0.0173 Epoch: 11 Global Step: 145080 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:17:30,249-Speed 3263.76 samples/sec Loss 3.4790 LearningRate 0.0173 Epoch: 11 Global Step: 145090 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:17:33,350-Speed 3302.92 samples/sec Loss 3.5029 LearningRate 0.0173 Epoch: 11 Global Step: 145100 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:17:36,481-Speed 3271.74 samples/sec Loss 3.5358 LearningRate 0.0173 Epoch: 11 Global Step: 145110 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:17:39,589-Speed 3295.78 samples/sec Loss 3.5227 LearningRate 0.0173 Epoch: 11 Global Step: 145120 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:17:42,683-Speed 3311.41 samples/sec Loss 3.5324 LearningRate 0.0173 Epoch: 11 Global Step: 145130 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:17:45,767-Speed 3321.11 samples/sec Loss 3.5919 LearningRate 0.0173 Epoch: 11 Global Step: 145140 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:17:48,833-Speed 3340.60 samples/sec Loss 3.5468 LearningRate 0.0173 Epoch: 11 Global Step: 145150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:17:51,919-Speed 3318.89 samples/sec Loss 3.4523 LearningRate 0.0173 Epoch: 11 Global Step: 145160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:17:55,059-Speed 3262.63 samples/sec Loss 3.5687 LearningRate 0.0173 Epoch: 11 Global Step: 145170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:17:58,126-Speed 3340.54 samples/sec Loss 3.4714 LearningRate 0.0173 Epoch: 11 Global Step: 145180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:18:01,223-Speed 3306.57 samples/sec Loss 3.5805 LearningRate 0.0173 Epoch: 11 Global Step: 145190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:18:04,343-Speed 3284.20 samples/sec Loss 3.4282 LearningRate 0.0173 Epoch: 11 Global Step: 145200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:18:07,439-Speed 3308.69 samples/sec Loss 3.4683 LearningRate 0.0173 Epoch: 11 Global Step: 145210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:18:10,550-Speed 3292.28 samples/sec Loss 3.4669 LearningRate 0.0173 Epoch: 11 Global Step: 145220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:18:13,650-Speed 3303.73 samples/sec Loss 3.5368 LearningRate 0.0173 Epoch: 11 Global Step: 145230 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:18:16,764-Speed 3289.78 samples/sec Loss 3.5076 LearningRate 0.0173 Epoch: 11 Global Step: 145240 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:18:19,883-Speed 3284.33 samples/sec Loss 3.5185 LearningRate 0.0172 Epoch: 11 Global Step: 145250 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:18:23,014-Speed 3270.68 samples/sec Loss 3.4180 LearningRate 0.0172 Epoch: 11 Global Step: 145260 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:18:26,118-Speed 3300.03 samples/sec Loss 3.5346 LearningRate 0.0172 Epoch: 11 Global Step: 145270 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:18:29,267-Speed 3253.12 samples/sec Loss 3.4705 LearningRate 0.0172 Epoch: 11 Global Step: 145280 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:18:32,339-Speed 3335.00 samples/sec Loss 3.5537 LearningRate 0.0172 Epoch: 11 Global Step: 145290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:18:35,418-Speed 3326.82 samples/sec Loss 3.5318 LearningRate 0.0172 Epoch: 11 Global Step: 145300 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:18:38,526-Speed 3295.75 samples/sec Loss 3.4495 LearningRate 0.0172 Epoch: 11 Global Step: 145310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:18:41,621-Speed 3309.69 samples/sec Loss 3.4073 LearningRate 0.0172 Epoch: 11 Global Step: 145320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:18:44,714-Speed 3311.50 samples/sec Loss 3.4843 LearningRate 0.0172 Epoch: 11 Global Step: 145330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:18:47,799-Speed 3319.78 samples/sec Loss 3.4639 LearningRate 0.0172 Epoch: 11 Global Step: 145340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:18:50,895-Speed 3309.18 samples/sec Loss 3.4624 LearningRate 0.0172 Epoch: 11 Global Step: 145350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:18:53,993-Speed 3306.52 samples/sec Loss 3.5628 LearningRate 0.0172 Epoch: 11 Global Step: 145360 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:18:57,062-Speed 3337.83 samples/sec Loss 3.5364 LearningRate 0.0172 Epoch: 11 Global Step: 145370 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:00,121-Speed 3348.39 samples/sec Loss 3.4832 LearningRate 0.0172 Epoch: 11 Global Step: 145380 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:03,199-Speed 3326.83 samples/sec Loss 3.4062 LearningRate 0.0172 Epoch: 11 Global Step: 145390 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:06,301-Speed 3302.28 samples/sec Loss 3.5309 LearningRate 0.0172 Epoch: 11 Global Step: 145400 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:09,393-Speed 3313.34 samples/sec Loss 3.4808 LearningRate 0.0172 Epoch: 11 Global Step: 145410 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:12,536-Speed 3259.37 samples/sec Loss 3.4748 LearningRate 0.0172 Epoch: 11 Global Step: 145420 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:15,636-Speed 3303.84 samples/sec Loss 3.5670 LearningRate 0.0172 Epoch: 11 Global Step: 145430 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:18,730-Speed 3311.21 samples/sec Loss 3.4565 LearningRate 0.0172 Epoch: 11 Global Step: 145440 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:21,786-Speed 3351.85 samples/sec Loss 3.4981 LearningRate 0.0172 Epoch: 11 Global Step: 145450 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:24,956-Speed 3231.48 samples/sec Loss 3.5093 LearningRate 0.0172 Epoch: 11 Global Step: 145460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:19:28,058-Speed 3302.52 samples/sec Loss 3.4671 LearningRate 0.0172 Epoch: 11 Global Step: 145470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:19:31,131-Speed 3332.90 samples/sec Loss 3.5766 LearningRate 0.0172 Epoch: 11 Global Step: 145480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:19:34,189-Speed 3350.07 samples/sec Loss 3.5325 LearningRate 0.0172 Epoch: 11 Global Step: 145490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:19:37,281-Speed 3312.80 samples/sec Loss 3.5883 LearningRate 0.0172 Epoch: 11 Global Step: 145500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:19:40,489-Speed 3193.33 samples/sec Loss 3.5632 LearningRate 0.0172 Epoch: 11 Global Step: 145510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:19:43,618-Speed 3273.65 samples/sec Loss 3.5034 LearningRate 0.0172 Epoch: 11 Global Step: 145520 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:46,665-Speed 3361.28 samples/sec Loss 3.5474 LearningRate 0.0172 Epoch: 11 Global Step: 145530 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:49,770-Speed 3298.99 samples/sec Loss 3.5011 LearningRate 0.0172 Epoch: 11 Global Step: 145540 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:52,911-Speed 3261.56 samples/sec Loss 3.4741 LearningRate 0.0171 Epoch: 11 Global Step: 145550 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:55,967-Speed 3351.95 samples/sec Loss 3.5442 LearningRate 0.0171 Epoch: 11 Global Step: 145560 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:19:59,062-Speed 3309.02 samples/sec Loss 3.5392 LearningRate 0.0171 Epoch: 11 Global Step: 145570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:02,159-Speed 3307.61 samples/sec Loss 3.5495 LearningRate 0.0171 Epoch: 11 Global Step: 145580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:05,223-Speed 3344.35 samples/sec Loss 3.5283 LearningRate 0.0171 Epoch: 11 Global Step: 145590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:08,323-Speed 3303.60 samples/sec Loss 3.5344 LearningRate 0.0171 Epoch: 11 Global Step: 145600 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:11,441-Speed 3285.58 samples/sec Loss 3.5911 LearningRate 0.0171 Epoch: 11 Global Step: 145610 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:14,540-Speed 3304.95 samples/sec Loss 3.5004 LearningRate 0.0171 Epoch: 11 Global Step: 145620 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:17,612-Speed 3334.78 samples/sec Loss 3.5359 LearningRate 0.0171 Epoch: 11 Global Step: 145630 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:21,408-Speed 2698.03 samples/sec Loss 3.4735 LearningRate 0.0171 Epoch: 11 Global Step: 145640 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:24,594-Speed 3214.69 samples/sec Loss 3.4934 LearningRate 0.0171 Epoch: 11 Global Step: 145650 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:27,713-Speed 3284.27 samples/sec Loss 3.4796 LearningRate 0.0171 Epoch: 11 Global Step: 145660 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:30,819-Speed 3297.86 samples/sec Loss 3.3514 LearningRate 0.0171 Epoch: 11 Global Step: 145670 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:33,921-Speed 3302.64 samples/sec Loss 3.5703 LearningRate 0.0171 Epoch: 11 Global Step: 145680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:37,033-Speed 3291.11 samples/sec Loss 3.4111 LearningRate 0.0171 Epoch: 11 Global Step: 145690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:40,212-Speed 3222.53 samples/sec Loss 3.5077 LearningRate 0.0171 Epoch: 11 Global Step: 145700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:43,327-Speed 3288.27 samples/sec Loss 3.4634 LearningRate 0.0171 Epoch: 11 Global Step: 145710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:46,432-Speed 3299.35 samples/sec Loss 3.5351 LearningRate 0.0171 Epoch: 11 Global Step: 145720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:20:49,487-Speed 3352.51 samples/sec Loss 3.5027 LearningRate 0.0171 Epoch: 11 Global Step: 145730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:52,614-Speed 3276.21 samples/sec Loss 3.4955 LearningRate 0.0171 Epoch: 11 Global Step: 145740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:55,754-Speed 3262.49 samples/sec Loss 3.5495 LearningRate 0.0171 Epoch: 11 Global Step: 145750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:20:58,852-Speed 3305.98 samples/sec Loss 3.5434 LearningRate 0.0171 Epoch: 11 Global Step: 145760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:01,979-Speed 3275.52 samples/sec Loss 3.5556 LearningRate 0.0171 Epoch: 11 Global Step: 145770 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:05,116-Speed 3265.18 samples/sec Loss 3.4668 LearningRate 0.0171 Epoch: 11 Global Step: 145780 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:08,219-Speed 3301.69 samples/sec Loss 3.4544 LearningRate 0.0171 Epoch: 11 Global Step: 145790 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:11,289-Speed 3336.91 samples/sec Loss 3.5836 LearningRate 0.0171 Epoch: 11 Global Step: 145800 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:14,381-Speed 3311.67 samples/sec Loss 3.5510 LearningRate 0.0171 Epoch: 11 Global Step: 145810 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:17,474-Speed 3312.09 samples/sec Loss 3.5310 LearningRate 0.0171 Epoch: 11 Global Step: 145820 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:20,580-Speed 3298.05 samples/sec Loss 3.5786 LearningRate 0.0171 Epoch: 11 Global Step: 145830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:21:23,647-Speed 3340.38 samples/sec Loss 3.4772 LearningRate 0.0171 Epoch: 11 Global Step: 145840 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:26,857-Speed 3190.76 samples/sec Loss 3.5346 LearningRate 0.0170 Epoch: 11 Global Step: 145850 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:29,939-Speed 3323.96 samples/sec Loss 3.5030 LearningRate 0.0170 Epoch: 11 Global Step: 145860 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:33,135-Speed 3204.67 samples/sec Loss 3.4622 LearningRate 0.0170 Epoch: 11 Global Step: 145870 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:36,266-Speed 3272.07 samples/sec Loss 3.4781 LearningRate 0.0170 Epoch: 11 Global Step: 145880 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:39,355-Speed 3315.65 samples/sec Loss 3.4824 LearningRate 0.0170 Epoch: 11 Global Step: 145890 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:42,501-Speed 3255.72 samples/sec Loss 3.5003 LearningRate 0.0170 Epoch: 11 Global Step: 145900 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:45,589-Speed 3317.66 samples/sec Loss 3.5027 LearningRate 0.0170 Epoch: 11 Global Step: 145910 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:48,722-Speed 3268.71 samples/sec Loss 3.4968 LearningRate 0.0170 Epoch: 11 Global Step: 145920 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:51,818-Speed 3308.73 samples/sec Loss 3.4762 LearningRate 0.0170 Epoch: 11 Global Step: 145930 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:21:54,987-Speed 3232.69 samples/sec Loss 3.5610 LearningRate 0.0170 Epoch: 11 Global Step: 145940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:21:58,052-Speed 3342.68 samples/sec Loss 3.5755 LearningRate 0.0170 Epoch: 11 Global Step: 145950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:22:01,177-Speed 3277.51 samples/sec Loss 3.4914 LearningRate 0.0170 Epoch: 11 Global Step: 145960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:22:04,339-Speed 3239.66 samples/sec Loss 3.5309 LearningRate 0.0170 Epoch: 11 Global Step: 145970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:22:07,470-Speed 3271.56 samples/sec Loss 3.4701 LearningRate 0.0170 Epoch: 11 Global Step: 145980 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:22:10,579-Speed 3293.87 samples/sec Loss 3.4996 LearningRate 0.0170 Epoch: 11 Global Step: 145990 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:22:13,716-Speed 3265.18 samples/sec Loss 3.5189 LearningRate 0.0170 Epoch: 11 Global Step: 146000 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:22:16,834-Speed 3285.60 samples/sec Loss 3.4853 LearningRate 0.0170 Epoch: 11 Global Step: 146010 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:22:19,954-Speed 3283.25 samples/sec Loss 3.4508 LearningRate 0.0170 Epoch: 11 Global Step: 146020 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:22:23,065-Speed 3292.33 samples/sec Loss 3.5791 LearningRate 0.0170 Epoch: 11 Global Step: 146030 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:22:26,162-Speed 3307.76 samples/sec Loss 3.4951 LearningRate 0.0170 Epoch: 11 Global Step: 146040 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:22:29,301-Speed 3263.69 samples/sec Loss 3.5335 LearningRate 0.0170 Epoch: 11 Global Step: 146050 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:22:32,480-Speed 3221.40 samples/sec Loss 3.5250 LearningRate 0.0170 Epoch: 11 Global Step: 146060 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:22:35,632-Speed 3249.92 samples/sec Loss 3.5803 LearningRate 0.0170 Epoch: 11 Global Step: 146070 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:22:38,726-Speed 3311.17 samples/sec Loss 3.4718 LearningRate 0.0170 Epoch: 11 Global Step: 146080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:22:41,860-Speed 3268.42 samples/sec Loss 3.4943 LearningRate 0.0170 Epoch: 11 Global Step: 146090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:22:45,011-Speed 3250.15 samples/sec Loss 3.6001 LearningRate 0.0170 Epoch: 11 Global Step: 146100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:22:48,128-Speed 3286.42 samples/sec Loss 3.5148 LearningRate 0.0170 Epoch: 11 Global Step: 146110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:22:51,278-Speed 3252.30 samples/sec Loss 3.4975 LearningRate 0.0170 Epoch: 11 Global Step: 146120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:22:54,351-Speed 3333.50 samples/sec Loss 3.5799 LearningRate 0.0170 Epoch: 11 Global Step: 146130 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:22:57,432-Speed 3324.57 samples/sec Loss 3.5193 LearningRate 0.0170 Epoch: 11 Global Step: 146140 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:00,555-Speed 3280.08 samples/sec Loss 3.5126 LearningRate 0.0169 Epoch: 11 Global Step: 146150 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:03,735-Speed 3221.10 samples/sec Loss 3.5709 LearningRate 0.0169 Epoch: 11 Global Step: 146160 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:06,863-Speed 3274.44 samples/sec Loss 3.4741 LearningRate 0.0169 Epoch: 11 Global Step: 146170 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:09,969-Speed 3298.40 samples/sec Loss 3.4714 LearningRate 0.0169 Epoch: 11 Global Step: 146180 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:13,242-Speed 3128.59 samples/sec Loss 3.5414 LearningRate 0.0169 Epoch: 11 Global Step: 146190 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:16,417-Speed 3226.58 samples/sec Loss 3.5642 LearningRate 0.0169 Epoch: 11 Global Step: 146200 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:19,516-Speed 3305.75 samples/sec Loss 3.4899 LearningRate 0.0169 Epoch: 11 Global Step: 146210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:22,603-Speed 3317.71 samples/sec Loss 3.5778 LearningRate 0.0169 Epoch: 11 Global Step: 146220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:25,708-Speed 3299.72 samples/sec Loss 3.5236 LearningRate 0.0169 Epoch: 11 Global Step: 146230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:23:28,853-Speed 3256.33 samples/sec Loss 3.4730 LearningRate 0.0169 Epoch: 11 Global Step: 146240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:23:31,967-Speed 3288.75 samples/sec Loss 3.4586 LearningRate 0.0169 Epoch: 11 Global Step: 146250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:23:35,051-Speed 3321.89 samples/sec Loss 3.4458 LearningRate 0.0169 Epoch: 11 Global Step: 146260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:23:38,210-Speed 3243.16 samples/sec Loss 3.5597 LearningRate 0.0169 Epoch: 11 Global Step: 146270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:23:41,304-Speed 3310.63 samples/sec Loss 3.5075 LearningRate 0.0169 Epoch: 11 Global Step: 146280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:23:44,364-Speed 3346.49 samples/sec Loss 3.4982 LearningRate 0.0169 Epoch: 11 Global Step: 146290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:47,446-Speed 3324.21 samples/sec Loss 3.4708 LearningRate 0.0169 Epoch: 11 Global Step: 146300 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:50,543-Speed 3306.97 samples/sec Loss 3.5425 LearningRate 0.0169 Epoch: 11 Global Step: 146310 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:53,668-Speed 3278.39 samples/sec Loss 3.5383 LearningRate 0.0169 Epoch: 11 Global Step: 146320 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:56,744-Speed 3330.26 samples/sec Loss 3.5056 LearningRate 0.0169 Epoch: 11 Global Step: 146330 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:23:59,844-Speed 3304.62 samples/sec Loss 3.6085 LearningRate 0.0169 Epoch: 11 Global Step: 146340 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:03,032-Speed 3212.44 samples/sec Loss 3.4795 LearningRate 0.0169 Epoch: 11 Global Step: 146350 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:06,185-Speed 3248.85 samples/sec Loss 3.6198 LearningRate 0.0169 Epoch: 11 Global Step: 146360 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:09,249-Speed 3342.73 samples/sec Loss 3.5278 LearningRate 0.0169 Epoch: 11 Global Step: 146370 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:12,379-Speed 3273.01 samples/sec Loss 3.5749 LearningRate 0.0169 Epoch: 11 Global Step: 146380 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:15,476-Speed 3307.79 samples/sec Loss 3.4046 LearningRate 0.0169 Epoch: 11 Global Step: 146390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:24:18,632-Speed 3245.43 samples/sec Loss 3.5019 LearningRate 0.0169 Epoch: 11 Global Step: 146400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:24:21,686-Speed 3354.52 samples/sec Loss 3.5414 LearningRate 0.0169 Epoch: 11 Global Step: 146410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:24:24,791-Speed 3298.78 samples/sec Loss 3.4552 LearningRate 0.0169 Epoch: 11 Global Step: 146420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:24:27,909-Speed 3285.39 samples/sec Loss 3.4999 LearningRate 0.0169 Epoch: 11 Global Step: 146430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:24:31,009-Speed 3304.04 samples/sec Loss 3.5142 LearningRate 0.0169 Epoch: 11 Global Step: 146440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:24:34,050-Speed 3368.93 samples/sec Loss 3.5767 LearningRate 0.0168 Epoch: 11 Global Step: 146450 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:37,108-Speed 3349.92 samples/sec Loss 3.4657 LearningRate 0.0168 Epoch: 11 Global Step: 146460 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:40,189-Speed 3324.02 samples/sec Loss 3.5562 LearningRate 0.0168 Epoch: 11 Global Step: 146470 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:43,295-Speed 3297.62 samples/sec Loss 3.5508 LearningRate 0.0168 Epoch: 11 Global Step: 146480 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:46,347-Speed 3357.06 samples/sec Loss 3.5571 LearningRate 0.0168 Epoch: 11 Global Step: 146490 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:49,440-Speed 3311.97 samples/sec Loss 3.5685 LearningRate 0.0168 Epoch: 11 Global Step: 146500 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:52,551-Speed 3291.99 samples/sec Loss 3.5223 LearningRate 0.0168 Epoch: 11 Global Step: 146510 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:55,609-Speed 3349.87 samples/sec Loss 3.5825 LearningRate 0.0168 Epoch: 11 Global Step: 146520 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:24:58,691-Speed 3323.39 samples/sec Loss 3.5421 LearningRate 0.0168 Epoch: 11 Global Step: 146530 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:01,805-Speed 3289.60 samples/sec Loss 3.5253 LearningRate 0.0168 Epoch: 11 Global Step: 146540 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:04,900-Speed 3309.66 samples/sec Loss 3.5742 LearningRate 0.0168 Epoch: 11 Global Step: 146550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:25:07,991-Speed 3314.61 samples/sec Loss 3.5621 LearningRate 0.0168 Epoch: 11 Global Step: 146560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:25:11,023-Speed 3377.32 samples/sec Loss 3.5212 LearningRate 0.0168 Epoch: 11 Global Step: 146570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:14,085-Speed 3345.75 samples/sec Loss 3.4995 LearningRate 0.0168 Epoch: 11 Global Step: 146580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:17,187-Speed 3302.65 samples/sec Loss 3.4200 LearningRate 0.0168 Epoch: 11 Global Step: 146590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:20,256-Speed 3337.70 samples/sec Loss 3.6831 LearningRate 0.0168 Epoch: 11 Global Step: 146600 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:23,400-Speed 3257.55 samples/sec Loss 3.5916 LearningRate 0.0168 Epoch: 11 Global Step: 146610 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:26,606-Speed 3195.81 samples/sec Loss 3.5634 LearningRate 0.0168 Epoch: 11 Global Step: 146620 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:29,742-Speed 3265.55 samples/sec Loss 3.5302 LearningRate 0.0168 Epoch: 11 Global Step: 146630 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:32,813-Speed 3335.53 samples/sec Loss 3.5093 LearningRate 0.0168 Epoch: 11 Global Step: 146640 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:35,925-Speed 3292.37 samples/sec Loss 3.5048 LearningRate 0.0168 Epoch: 11 Global Step: 146650 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:39,020-Speed 3309.61 samples/sec Loss 3.4411 LearningRate 0.0168 Epoch: 11 Global Step: 146660 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:42,085-Speed 3341.66 samples/sec Loss 3.4894 LearningRate 0.0168 Epoch: 11 Global Step: 146670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:25:45,142-Speed 3350.61 samples/sec Loss 3.5491 LearningRate 0.0168 Epoch: 11 Global Step: 146680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:25:48,248-Speed 3298.35 samples/sec Loss 3.5505 LearningRate 0.0168 Epoch: 11 Global Step: 146690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:51,336-Speed 3317.29 samples/sec Loss 3.5181 LearningRate 0.0168 Epoch: 11 Global Step: 146700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:54,427-Speed 3313.93 samples/sec Loss 3.6137 LearningRate 0.0168 Epoch: 11 Global Step: 146710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:25:57,539-Speed 3291.10 samples/sec Loss 3.5583 LearningRate 0.0168 Epoch: 11 Global Step: 146720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:26:00,636-Speed 3308.13 samples/sec Loss 3.4937 LearningRate 0.0168 Epoch: 11 Global Step: 146730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:26:03,812-Speed 3225.20 samples/sec Loss 3.5115 LearningRate 0.0168 Epoch: 11 Global Step: 146740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:26:06,877-Speed 3341.96 samples/sec Loss 3.5775 LearningRate 0.0167 Epoch: 11 Global Step: 146750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:26:09,915-Speed 3371.97 samples/sec Loss 3.5558 LearningRate 0.0167 Epoch: 11 Global Step: 146760 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:26:13,055-Speed 3262.10 samples/sec Loss 3.4522 LearningRate 0.0167 Epoch: 11 Global Step: 146770 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:26:16,151-Speed 3308.15 samples/sec Loss 3.5521 LearningRate 0.0167 Epoch: 11 Global Step: 146780 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:26:19,273-Speed 3280.61 samples/sec Loss 3.4916 LearningRate 0.0167 Epoch: 11 Global Step: 146790 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:26:22,361-Speed 3317.93 samples/sec Loss 3.4916 LearningRate 0.0167 Epoch: 11 Global Step: 146800 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:26:25,478-Speed 3286.46 samples/sec Loss 3.5341 LearningRate 0.0167 Epoch: 11 Global Step: 146810 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:26:28,666-Speed 3212.21 samples/sec Loss 3.4978 LearningRate 0.0167 Epoch: 11 Global Step: 146820 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:26:31,744-Speed 3328.43 samples/sec Loss 3.4698 LearningRate 0.0167 Epoch: 11 Global Step: 146830 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:26:34,817-Speed 3333.24 samples/sec Loss 3.5998 LearningRate 0.0167 Epoch: 11 Global Step: 146840 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:26:37,912-Speed 3309.50 samples/sec Loss 3.6002 LearningRate 0.0167 Epoch: 11 Global Step: 146850 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:26:41,042-Speed 3272.14 samples/sec Loss 3.4496 LearningRate 0.0167 Epoch: 11 Global Step: 146860 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:26:44,130-Speed 3316.83 samples/sec Loss 3.5491 LearningRate 0.0167 Epoch: 11 Global Step: 146870 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:26:47,206-Speed 3330.68 samples/sec Loss 3.5009 LearningRate 0.0167 Epoch: 11 Global Step: 146880 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:26:50,275-Speed 3337.28 samples/sec Loss 3.5012 LearningRate 0.0167 Epoch: 11 Global Step: 146890 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:26:53,406-Speed 3272.30 samples/sec Loss 3.5361 LearningRate 0.0167 Epoch: 11 Global Step: 146900 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:26:56,533-Speed 3275.23 samples/sec Loss 3.4733 LearningRate 0.0167 Epoch: 11 Global Step: 146910 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:26:59,694-Speed 3240.82 samples/sec Loss 3.4860 LearningRate 0.0167 Epoch: 11 Global Step: 146920 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:27:02,870-Speed 3225.48 samples/sec Loss 3.5405 LearningRate 0.0167 Epoch: 11 Global Step: 146930 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:27:06,018-Speed 3253.83 samples/sec Loss 3.5273 LearningRate 0.0167 Epoch: 11 Global Step: 146940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:27:09,078-Speed 3347.42 samples/sec Loss 3.5343 LearningRate 0.0167 Epoch: 11 Global Step: 146950 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:27:12,239-Speed 3240.39 samples/sec Loss 3.4569 LearningRate 0.0167 Epoch: 11 Global Step: 146960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:15,317-Speed 3328.26 samples/sec Loss 3.5374 LearningRate 0.0167 Epoch: 11 Global Step: 146970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:18,464-Speed 3254.35 samples/sec Loss 3.5049 LearningRate 0.0167 Epoch: 11 Global Step: 146980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:21,534-Speed 3336.14 samples/sec Loss 3.5315 LearningRate 0.0167 Epoch: 11 Global Step: 146990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:24,625-Speed 3314.61 samples/sec Loss 3.6343 LearningRate 0.0167 Epoch: 11 Global Step: 147000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:27,757-Speed 3269.80 samples/sec Loss 3.4755 LearningRate 0.0167 Epoch: 11 Global Step: 147010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:30,866-Speed 3294.77 samples/sec Loss 3.4919 LearningRate 0.0167 Epoch: 11 Global Step: 147020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:33,952-Speed 3319.82 samples/sec Loss 3.5162 LearningRate 0.0167 Epoch: 11 Global Step: 147030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:37,057-Speed 3298.49 samples/sec Loss 3.5342 LearningRate 0.0167 Epoch: 11 Global Step: 147040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:40,180-Speed 3280.01 samples/sec Loss 3.5251 LearningRate 0.0167 Epoch: 11 Global Step: 147050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:43,326-Speed 3256.33 samples/sec Loss 3.5881 LearningRate 0.0166 Epoch: 11 Global Step: 147060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:27:46,390-Speed 3343.16 samples/sec Loss 3.5126 LearningRate 0.0166 Epoch: 11 Global Step: 147070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:49,546-Speed 3245.48 samples/sec Loss 3.5535 LearningRate 0.0166 Epoch: 11 Global Step: 147080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:52,683-Speed 3265.07 samples/sec Loss 3.4166 LearningRate 0.0166 Epoch: 11 Global Step: 147090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:55,788-Speed 3298.91 samples/sec Loss 3.4308 LearningRate 0.0166 Epoch: 11 Global Step: 147100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:27:58,832-Speed 3365.70 samples/sec Loss 3.6180 LearningRate 0.0166 Epoch: 11 Global Step: 147110 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:28:01,986-Speed 3247.81 samples/sec Loss 3.4876 LearningRate 0.0166 Epoch: 11 Global Step: 147120 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:28:05,077-Speed 3314.10 samples/sec Loss 3.4752 LearningRate 0.0166 Epoch: 11 Global Step: 147130 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:28:08,143-Speed 3340.75 samples/sec Loss 3.5571 LearningRate 0.0166 Epoch: 11 Global Step: 147140 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:28:11,326-Speed 3218.48 samples/sec Loss 3.5377 LearningRate 0.0166 Epoch: 11 Global Step: 147150 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:28:14,424-Speed 3306.33 samples/sec Loss 3.5572 LearningRate 0.0166 Epoch: 11 Global Step: 147160 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:28:17,494-Speed 3336.79 samples/sec Loss 3.5546 LearningRate 0.0166 Epoch: 11 Global Step: 147170 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:28:20,548-Speed 3353.66 samples/sec Loss 3.4559 LearningRate 0.0166 Epoch: 11 Global Step: 147180 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:28:23,668-Speed 3283.08 samples/sec Loss 3.5302 LearningRate 0.0166 Epoch: 11 Global Step: 147190 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:28:26,747-Speed 3327.18 samples/sec Loss 3.4757 LearningRate 0.0166 Epoch: 11 Global Step: 147200 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:28:29,844-Speed 3307.08 samples/sec Loss 3.5930 LearningRate 0.0166 Epoch: 11 Global Step: 147210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:28:32,923-Speed 3327.20 samples/sec Loss 3.5378 LearningRate 0.0166 Epoch: 11 Global Step: 147220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:28:35,981-Speed 3349.95 samples/sec Loss 3.5971 LearningRate 0.0166 Epoch: 11 Global Step: 147230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:28:39,090-Speed 3294.17 samples/sec Loss 3.5215 LearningRate 0.0166 Epoch: 11 Global Step: 147240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:28:42,238-Speed 3253.94 samples/sec Loss 3.5457 LearningRate 0.0166 Epoch: 11 Global Step: 147250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:28:45,325-Speed 3318.03 samples/sec Loss 3.4896 LearningRate 0.0166 Epoch: 11 Global Step: 147260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:28:48,476-Speed 3250.42 samples/sec Loss 3.5232 LearningRate 0.0166 Epoch: 11 Global Step: 147270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:28:51,633-Speed 3245.34 samples/sec Loss 3.4300 LearningRate 0.0166 Epoch: 11 Global Step: 147280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:28:54,780-Speed 3255.40 samples/sec Loss 3.5819 LearningRate 0.0166 Epoch: 11 Global Step: 147290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:28:57,858-Speed 3327.24 samples/sec Loss 3.5605 LearningRate 0.0166 Epoch: 11 Global Step: 147300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:29:00,948-Speed 3314.77 samples/sec Loss 3.4326 LearningRate 0.0166 Epoch: 11 Global Step: 147310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:29:04,076-Speed 3275.47 samples/sec Loss 3.5513 LearningRate 0.0166 Epoch: 11 Global Step: 147320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:29:07,175-Speed 3304.41 samples/sec Loss 3.5196 LearningRate 0.0166 Epoch: 11 Global Step: 147330 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:10,290-Speed 3288.42 samples/sec Loss 3.5254 LearningRate 0.0166 Epoch: 11 Global Step: 147340 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:13,425-Speed 3268.48 samples/sec Loss 3.5782 LearningRate 0.0166 Epoch: 11 Global Step: 147350 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:16,493-Speed 3338.36 samples/sec Loss 3.5945 LearningRate 0.0165 Epoch: 11 Global Step: 147360 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:19,580-Speed 3317.65 samples/sec Loss 3.5886 LearningRate 0.0165 Epoch: 11 Global Step: 147370 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:22,668-Speed 3317.45 samples/sec Loss 3.4799 LearningRate 0.0165 Epoch: 11 Global Step: 147380 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:25,766-Speed 3306.16 samples/sec Loss 3.5587 LearningRate 0.0165 Epoch: 11 Global Step: 147390 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:28,828-Speed 3345.54 samples/sec Loss 3.5028 LearningRate 0.0165 Epoch: 11 Global Step: 147400 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:31,974-Speed 3255.43 samples/sec Loss 3.4910 LearningRate 0.0165 Epoch: 11 Global Step: 147410 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:35,054-Speed 3326.37 samples/sec Loss 3.4823 LearningRate 0.0165 Epoch: 11 Global Step: 147420 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:38,234-Speed 3221.25 samples/sec Loss 3.6347 LearningRate 0.0165 Epoch: 11 Global Step: 147430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:29:41,322-Speed 3316.54 samples/sec Loss 3.4752 LearningRate 0.0165 Epoch: 11 Global Step: 147440 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:44,377-Speed 3353.87 samples/sec Loss 3.5374 LearningRate 0.0165 Epoch: 11 Global Step: 147450 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:47,507-Speed 3272.03 samples/sec Loss 3.4836 LearningRate 0.0165 Epoch: 11 Global Step: 147460 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:50,697-Speed 3210.96 samples/sec Loss 3.4663 LearningRate 0.0165 Epoch: 11 Global Step: 147470 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:53,827-Speed 3273.37 samples/sec Loss 3.5404 LearningRate 0.0165 Epoch: 11 Global Step: 147480 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:29:56,930-Speed 3300.85 samples/sec Loss 3.4895 LearningRate 0.0165 Epoch: 11 Global Step: 147490 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:30:00,014-Speed 3320.99 samples/sec Loss 3.5848 LearningRate 0.0165 Epoch: 11 Global Step: 147500 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:30:03,104-Speed 3315.65 samples/sec Loss 3.5993 LearningRate 0.0165 Epoch: 11 Global Step: 147510 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:30:06,261-Speed 3243.91 samples/sec Loss 3.5202 LearningRate 0.0165 Epoch: 11 Global Step: 147520 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:30:09,327-Speed 3341.55 samples/sec Loss 3.5365 LearningRate 0.0165 Epoch: 11 Global Step: 147530 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:30:12,470-Speed 3259.30 samples/sec Loss 3.5136 LearningRate 0.0165 Epoch: 11 Global Step: 147540 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:30:15,603-Speed 3269.55 samples/sec Loss 3.5390 LearningRate 0.0165 Epoch: 11 Global Step: 147550 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:30:18,781-Speed 3223.29 samples/sec Loss 3.5011 LearningRate 0.0165 Epoch: 11 Global Step: 147560 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:30:21,871-Speed 3314.97 samples/sec Loss 3.4292 LearningRate 0.0165 Epoch: 11 Global Step: 147570 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:30:25,037-Speed 3235.63 samples/sec Loss 3.5163 LearningRate 0.0165 Epoch: 11 Global Step: 147580 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:30:28,123-Speed 3318.64 samples/sec Loss 3.5379 LearningRate 0.0165 Epoch: 11 Global Step: 147590 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:30:31,286-Speed 3238.95 samples/sec Loss 3.5226 LearningRate 0.0165 Epoch: 11 Global Step: 147600 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:30:34,364-Speed 3327.81 samples/sec Loss 3.4312 LearningRate 0.0165 Epoch: 11 Global Step: 147610 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:30:37,490-Speed 3276.79 samples/sec Loss 3.5533 LearningRate 0.0165 Epoch: 11 Global Step: 147620 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:30:40,639-Speed 3253.09 samples/sec Loss 3.5883 LearningRate 0.0165 Epoch: 11 Global Step: 147630 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:30:43,749-Speed 3293.48 samples/sec Loss 3.5904 LearningRate 0.0165 Epoch: 11 Global Step: 147640 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:30:46,882-Speed 3269.59 samples/sec Loss 3.4545 LearningRate 0.0165 Epoch: 11 Global Step: 147650 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:30:50,065-Speed 3218.32 samples/sec Loss 3.5568 LearningRate 0.0165 Epoch: 11 Global Step: 147660 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:30:53,293-Speed 3172.72 samples/sec Loss 3.5333 LearningRate 0.0164 Epoch: 11 Global Step: 147670 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:30:56,404-Speed 3293.22 samples/sec Loss 3.5752 LearningRate 0.0164 Epoch: 11 Global Step: 147680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:30:59,559-Speed 3246.24 samples/sec Loss 3.5456 LearningRate 0.0164 Epoch: 11 Global Step: 147690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:31:02,656-Speed 3307.73 samples/sec Loss 3.5225 LearningRate 0.0164 Epoch: 11 Global Step: 147700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:31:05,774-Speed 3285.28 samples/sec Loss 3.5807 LearningRate 0.0164 Epoch: 11 Global Step: 147710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:31:08,853-Speed 3326.72 samples/sec Loss 3.5157 LearningRate 0.0164 Epoch: 11 Global Step: 147720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:31:11,970-Speed 3286.19 samples/sec Loss 3.5041 LearningRate 0.0164 Epoch: 11 Global Step: 147730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:31:15,126-Speed 3246.08 samples/sec Loss 3.4634 LearningRate 0.0164 Epoch: 11 Global Step: 147740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:31:18,269-Speed 3258.67 samples/sec Loss 3.4807 LearningRate 0.0164 Epoch: 11 Global Step: 147750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:21,389-Speed 3283.57 samples/sec Loss 3.5216 LearningRate 0.0164 Epoch: 11 Global Step: 147760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:24,580-Speed 3210.03 samples/sec Loss 3.5507 LearningRate 0.0164 Epoch: 11 Global Step: 147770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:27,799-Speed 3181.87 samples/sec Loss 3.4320 LearningRate 0.0164 Epoch: 11 Global Step: 147780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:30,922-Speed 3279.62 samples/sec Loss 3.5470 LearningRate 0.0164 Epoch: 11 Global Step: 147790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:33,996-Speed 3332.49 samples/sec Loss 3.4950 LearningRate 0.0164 Epoch: 11 Global Step: 147800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:37,117-Speed 3282.37 samples/sec Loss 3.4999 LearningRate 0.0164 Epoch: 11 Global Step: 147810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:40,247-Speed 3271.89 samples/sec Loss 3.4975 LearningRate 0.0164 Epoch: 11 Global Step: 147820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:43,351-Speed 3300.11 samples/sec Loss 3.5384 LearningRate 0.0164 Epoch: 11 Global Step: 147830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:46,427-Speed 3330.60 samples/sec Loss 3.5398 LearningRate 0.0164 Epoch: 11 Global Step: 147840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:49,539-Speed 3291.57 samples/sec Loss 3.4546 LearningRate 0.0164 Epoch: 11 Global Step: 147850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:31:52,609-Speed 3336.86 samples/sec Loss 3.4865 LearningRate 0.0164 Epoch: 11 Global Step: 147860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:55,698-Speed 3316.51 samples/sec Loss 3.5066 LearningRate 0.0164 Epoch: 11 Global Step: 147870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:31:58,814-Speed 3287.09 samples/sec Loss 3.5009 LearningRate 0.0164 Epoch: 11 Global Step: 147880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:01,943-Speed 3273.09 samples/sec Loss 3.5200 LearningRate 0.0164 Epoch: 11 Global Step: 147890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:05,130-Speed 3214.01 samples/sec Loss 3.5138 LearningRate 0.0164 Epoch: 11 Global Step: 147900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:08,208-Speed 3328.15 samples/sec Loss 3.5505 LearningRate 0.0164 Epoch: 11 Global Step: 147910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:11,338-Speed 3272.77 samples/sec Loss 3.5289 LearningRate 0.0164 Epoch: 11 Global Step: 147920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:14,445-Speed 3296.45 samples/sec Loss 3.5601 LearningRate 0.0164 Epoch: 11 Global Step: 147930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:17,530-Speed 3320.28 samples/sec Loss 3.4179 LearningRate 0.0164 Epoch: 11 Global Step: 147940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:20,600-Speed 3336.56 samples/sec Loss 3.4891 LearningRate 0.0164 Epoch: 11 Global Step: 147950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:23,760-Speed 3241.55 samples/sec Loss 3.4607 LearningRate 0.0164 Epoch: 11 Global Step: 147960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:32:26,862-Speed 3302.42 samples/sec Loss 3.5336 LearningRate 0.0164 Epoch: 11 Global Step: 147970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:32:29,970-Speed 3295.07 samples/sec Loss 3.4831 LearningRate 0.0163 Epoch: 11 Global Step: 147980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:32:33,096-Speed 3277.16 samples/sec Loss 3.4907 LearningRate 0.0163 Epoch: 11 Global Step: 147990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:32:36,237-Speed 3261.86 samples/sec Loss 3.5622 LearningRate 0.0163 Epoch: 11 Global Step: 148000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:32:39,441-Speed 3196.40 samples/sec Loss 3.5405 LearningRate 0.0163 Epoch: 11 Global Step: 148010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:32:42,586-Speed 3257.13 samples/sec Loss 3.4861 LearningRate 0.0163 Epoch: 11 Global Step: 148020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:45,651-Speed 3342.35 samples/sec Loss 3.5962 LearningRate 0.0163 Epoch: 11 Global Step: 148030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:48,819-Speed 3232.98 samples/sec Loss 3.5107 LearningRate 0.0163 Epoch: 11 Global Step: 148040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:51,954-Speed 3267.60 samples/sec Loss 3.5634 LearningRate 0.0163 Epoch: 11 Global Step: 148050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:55,071-Speed 3286.52 samples/sec Loss 3.5642 LearningRate 0.0163 Epoch: 11 Global Step: 148060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:32:58,134-Speed 3343.61 samples/sec Loss 3.5394 LearningRate 0.0163 Epoch: 11 Global Step: 148070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:33:01,192-Speed 3350.39 samples/sec Loss 3.4466 LearningRate 0.0163 Epoch: 11 Global Step: 148080 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:04,261-Speed 3337.37 samples/sec Loss 3.4605 LearningRate 0.0163 Epoch: 11 Global Step: 148090 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:07,330-Speed 3338.54 samples/sec Loss 3.5109 LearningRate 0.0163 Epoch: 11 Global Step: 148100 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:10,383-Speed 3354.23 samples/sec Loss 3.5139 LearningRate 0.0163 Epoch: 11 Global Step: 148110 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:13,531-Speed 3254.10 samples/sec Loss 3.5091 LearningRate 0.0163 Epoch: 11 Global Step: 148120 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:16,704-Speed 3228.35 samples/sec Loss 3.5695 LearningRate 0.0163 Epoch: 11 Global Step: 148130 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:19,816-Speed 3291.34 samples/sec Loss 3.5277 LearningRate 0.0163 Epoch: 11 Global Step: 148140 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:22,902-Speed 3319.17 samples/sec Loss 3.4796 LearningRate 0.0163 Epoch: 11 Global Step: 148150 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:26,009-Speed 3297.27 samples/sec Loss 3.5473 LearningRate 0.0163 Epoch: 11 Global Step: 148160 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:29,125-Speed 3287.44 samples/sec Loss 3.6371 LearningRate 0.0163 Epoch: 11 Global Step: 148170 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:32,225-Speed 3304.09 samples/sec Loss 3.5921 LearningRate 0.0163 Epoch: 11 Global Step: 148180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:33:35,312-Speed 3318.57 samples/sec Loss 3.4375 LearningRate 0.0163 Epoch: 11 Global Step: 148190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:33:38,474-Speed 3239.01 samples/sec Loss 3.5612 LearningRate 0.0163 Epoch: 11 Global Step: 148200 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:41,614-Speed 3262.73 samples/sec Loss 3.5431 LearningRate 0.0163 Epoch: 11 Global Step: 148210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:44,704-Speed 3314.77 samples/sec Loss 3.5323 LearningRate 0.0163 Epoch: 11 Global Step: 148220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:47,782-Speed 3328.44 samples/sec Loss 3.5663 LearningRate 0.0163 Epoch: 11 Global Step: 148230 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:50,839-Speed 3350.78 samples/sec Loss 3.5152 LearningRate 0.0163 Epoch: 11 Global Step: 148240 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:53,929-Speed 3314.66 samples/sec Loss 3.5467 LearningRate 0.0163 Epoch: 11 Global Step: 148250 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:33:57,017-Speed 3317.52 samples/sec Loss 3.4600 LearningRate 0.0163 Epoch: 11 Global Step: 148260 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:34:00,153-Speed 3265.75 samples/sec Loss 3.4091 LearningRate 0.0163 Epoch: 11 Global Step: 148270 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:34:03,215-Speed 3345.09 samples/sec Loss 3.4179 LearningRate 0.0162 Epoch: 11 Global Step: 148280 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:34:06,301-Speed 3320.29 samples/sec Loss 3.5812 LearningRate 0.0162 Epoch: 11 Global Step: 148290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:34:09,369-Speed 3338.32 samples/sec Loss 3.5266 LearningRate 0.0162 Epoch: 11 Global Step: 148300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:34:12,482-Speed 3290.73 samples/sec Loss 3.4590 LearningRate 0.0162 Epoch: 11 Global Step: 148310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:34:15,564-Speed 3322.72 samples/sec Loss 3.5018 LearningRate 0.0162 Epoch: 11 Global Step: 148320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:34:18,620-Speed 3352.11 samples/sec Loss 3.5488 LearningRate 0.0162 Epoch: 11 Global Step: 148330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:34:21,697-Speed 3328.76 samples/sec Loss 3.4670 LearningRate 0.0162 Epoch: 11 Global Step: 148340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:34:24,750-Speed 3355.88 samples/sec Loss 3.5260 LearningRate 0.0162 Epoch: 11 Global Step: 148350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:34:27,865-Speed 3288.59 samples/sec Loss 3.5335 LearningRate 0.0162 Epoch: 11 Global Step: 148360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:34:30,951-Speed 3319.57 samples/sec Loss 3.5382 LearningRate 0.0162 Epoch: 11 Global Step: 148370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:34:34,033-Speed 3322.99 samples/sec Loss 3.4568 LearningRate 0.0162 Epoch: 11 Global Step: 148380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:34:37,108-Speed 3331.79 samples/sec Loss 3.4956 LearningRate 0.0162 Epoch: 11 Global Step: 148390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:34:40,243-Speed 3267.05 samples/sec Loss 3.4900 LearningRate 0.0162 Epoch: 11 Global Step: 148400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:34:43,324-Speed 3325.05 samples/sec Loss 3.5198 LearningRate 0.0162 Epoch: 11 Global Step: 148410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:34:46,383-Speed 3348.42 samples/sec Loss 3.4800 LearningRate 0.0162 Epoch: 11 Global Step: 148420 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:34:49,490-Speed 3296.61 samples/sec Loss 3.4942 LearningRate 0.0162 Epoch: 11 Global Step: 148430 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:34:52,594-Speed 3300.24 samples/sec Loss 3.5352 LearningRate 0.0162 Epoch: 11 Global Step: 148440 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:34:55,671-Speed 3328.76 samples/sec Loss 3.5756 LearningRate 0.0162 Epoch: 11 Global Step: 148450 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:34:58,738-Speed 3339.86 samples/sec Loss 3.5241 LearningRate 0.0162 Epoch: 11 Global Step: 148460 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:01,803-Speed 3341.75 samples/sec Loss 3.5836 LearningRate 0.0162 Epoch: 11 Global Step: 148470 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:04,858-Speed 3353.02 samples/sec Loss 3.4852 LearningRate 0.0162 Epoch: 11 Global Step: 148480 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:07,912-Speed 3353.76 samples/sec Loss 3.5666 LearningRate 0.0162 Epoch: 11 Global Step: 148490 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:10,997-Speed 3320.37 samples/sec Loss 3.5352 LearningRate 0.0162 Epoch: 11 Global Step: 148500 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:14,081-Speed 3321.54 samples/sec Loss 3.5298 LearningRate 0.0162 Epoch: 11 Global Step: 148510 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:17,195-Speed 3289.46 samples/sec Loss 3.5920 LearningRate 0.0162 Epoch: 11 Global Step: 148520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:35:20,302-Speed 3296.37 samples/sec Loss 3.5269 LearningRate 0.0162 Epoch: 11 Global Step: 148530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:35:23,374-Speed 3335.01 samples/sec Loss 3.4807 LearningRate 0.0162 Epoch: 11 Global Step: 148540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:35:26,477-Speed 3301.03 samples/sec Loss 3.6043 LearningRate 0.0162 Epoch: 11 Global Step: 148550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:35:29,517-Speed 3369.84 samples/sec Loss 3.5718 LearningRate 0.0162 Epoch: 11 Global Step: 148560 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:32,608-Speed 3314.44 samples/sec Loss 3.5346 LearningRate 0.0162 Epoch: 11 Global Step: 148570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:35,708-Speed 3304.34 samples/sec Loss 3.5892 LearningRate 0.0162 Epoch: 11 Global Step: 148580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:38,791-Speed 3321.62 samples/sec Loss 3.5743 LearningRate 0.0161 Epoch: 11 Global Step: 148590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:41,887-Speed 3309.61 samples/sec Loss 3.4775 LearningRate 0.0161 Epoch: 11 Global Step: 148600 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:44,967-Speed 3325.48 samples/sec Loss 3.5615 LearningRate 0.0161 Epoch: 11 Global Step: 148610 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:48,069-Speed 3302.70 samples/sec Loss 3.5200 LearningRate 0.0161 Epoch: 11 Global Step: 148620 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:51,252-Speed 3217.34 samples/sec Loss 3.4718 LearningRate 0.0161 Epoch: 11 Global Step: 148630 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:54,403-Speed 3250.92 samples/sec Loss 3.5299 LearningRate 0.0161 Epoch: 11 Global Step: 148640 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:35:57,443-Speed 3369.88 samples/sec Loss 3.4283 LearningRate 0.0161 Epoch: 11 Global Step: 148650 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:36:00,531-Speed 3317.55 samples/sec Loss 3.4633 LearningRate 0.0161 Epoch: 11 Global Step: 148660 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:36:03,666-Speed 3267.67 samples/sec Loss 3.4773 LearningRate 0.0161 Epoch: 11 Global Step: 148670 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:36:06,778-Speed 3290.93 samples/sec Loss 3.5744 LearningRate 0.0161 Epoch: 11 Global Step: 148680 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:36:09,855-Speed 3328.90 samples/sec Loss 3.5567 LearningRate 0.0161 Epoch: 11 Global Step: 148690 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:36:12,975-Speed 3283.05 samples/sec Loss 3.4370 LearningRate 0.0161 Epoch: 11 Global Step: 148700 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:36:16,164-Speed 3212.20 samples/sec Loss 3.5564 LearningRate 0.0161 Epoch: 11 Global Step: 148710 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:36:19,248-Speed 3321.24 samples/sec Loss 3.4553 LearningRate 0.0161 Epoch: 11 Global Step: 148720 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:36:22,300-Speed 3357.07 samples/sec Loss 3.6637 LearningRate 0.0161 Epoch: 11 Global Step: 148730 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:36:25,386-Speed 3318.84 samples/sec Loss 3.5442 LearningRate 0.0161 Epoch: 11 Global Step: 148740 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-27 14:36:28,522-Speed 3266.89 samples/sec Loss 3.4594 LearningRate 0.0161 Epoch: 11 Global Step: 148750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:36:31,705-Speed 3217.85 samples/sec Loss 3.4364 LearningRate 0.0161 Epoch: 11 Global Step: 148760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:36:34,768-Speed 3343.83 samples/sec Loss 3.5199 LearningRate 0.0161 Epoch: 11 Global Step: 148770 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:36:37,912-Speed 3258.06 samples/sec Loss 3.5884 LearningRate 0.0161 Epoch: 11 Global Step: 148780 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:36:41,047-Speed 3267.60 samples/sec Loss 3.5324 LearningRate 0.0161 Epoch: 11 Global Step: 148790 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:36:44,131-Speed 3321.98 samples/sec Loss 3.5000 LearningRate 0.0161 Epoch: 11 Global Step: 148800 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:36:47,218-Speed 3318.14 samples/sec Loss 3.5630 LearningRate 0.0161 Epoch: 11 Global Step: 148810 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:36:50,345-Speed 3275.20 samples/sec Loss 3.5756 LearningRate 0.0161 Epoch: 11 Global Step: 148820 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:36:53,423-Speed 3328.11 samples/sec Loss 3.5002 LearningRate 0.0161 Epoch: 11 Global Step: 148830 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:36:56,499-Speed 3330.56 samples/sec Loss 3.4609 LearningRate 0.0161 Epoch: 11 Global Step: 148840 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:36:59,571-Speed 3333.80 samples/sec Loss 3.5010 LearningRate 0.0161 Epoch: 11 Global Step: 148850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:02,690-Speed 3285.08 samples/sec Loss 3.5420 LearningRate 0.0161 Epoch: 11 Global Step: 148860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:05,770-Speed 3325.74 samples/sec Loss 3.4581 LearningRate 0.0161 Epoch: 11 Global Step: 148870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:08,852-Speed 3323.00 samples/sec Loss 3.5421 LearningRate 0.0161 Epoch: 11 Global Step: 148880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:11,985-Speed 3269.56 samples/sec Loss 3.5222 LearningRate 0.0161 Epoch: 11 Global Step: 148890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:15,110-Speed 3278.59 samples/sec Loss 3.6086 LearningRate 0.0160 Epoch: 11 Global Step: 148900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:18,213-Speed 3300.27 samples/sec Loss 3.4869 LearningRate 0.0160 Epoch: 11 Global Step: 148910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:21,279-Speed 3341.82 samples/sec Loss 3.5041 LearningRate 0.0160 Epoch: 11 Global Step: 148920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:24,366-Speed 3317.49 samples/sec Loss 3.4508 LearningRate 0.0160 Epoch: 11 Global Step: 148930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:27,546-Speed 3221.96 samples/sec Loss 3.5883 LearningRate 0.0160 Epoch: 11 Global Step: 148940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:30,693-Speed 3254.53 samples/sec Loss 3.6199 LearningRate 0.0160 Epoch: 11 Global Step: 148950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:37:33,753-Speed 3347.70 samples/sec Loss 3.5370 LearningRate 0.0160 Epoch: 11 Global Step: 148960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:36,865-Speed 3291.24 samples/sec Loss 3.4136 LearningRate 0.0160 Epoch: 11 Global Step: 148970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:39,968-Speed 3301.50 samples/sec Loss 3.5619 LearningRate 0.0160 Epoch: 11 Global Step: 148980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:43,104-Speed 3266.57 samples/sec Loss 3.5477 LearningRate 0.0160 Epoch: 11 Global Step: 148990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:46,184-Speed 3325.91 samples/sec Loss 3.5640 LearningRate 0.0160 Epoch: 11 Global Step: 149000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:49,249-Speed 3341.20 samples/sec Loss 3.5829 LearningRate 0.0160 Epoch: 11 Global Step: 149010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:52,363-Speed 3289.98 samples/sec Loss 3.5982 LearningRate 0.0160 Epoch: 11 Global Step: 149020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:55,510-Speed 3255.04 samples/sec Loss 3.6021 LearningRate 0.0160 Epoch: 11 Global Step: 149030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:37:58,596-Speed 3319.68 samples/sec Loss 3.4916 LearningRate 0.0160 Epoch: 11 Global Step: 149040 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:38:01,871-Speed 3127.61 samples/sec Loss 3.4720 LearningRate 0.0160 Epoch: 11 Global Step: 149050 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:38:33,273-Speed 326.11 samples/sec Loss 2.6755 LearningRate 0.0160 Epoch: 12 Global Step: 149060 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:38:36,731-Speed 2962.98 samples/sec Loss 2.5341 LearningRate 0.0160 Epoch: 12 Global Step: 149070 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:38:39,810-Speed 3326.22 samples/sec Loss 2.4833 LearningRate 0.0160 Epoch: 12 Global Step: 149080 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:38:42,863-Speed 3355.59 samples/sec Loss 2.4804 LearningRate 0.0160 Epoch: 12 Global Step: 149090 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:38:45,919-Speed 3351.44 samples/sec Loss 2.4738 LearningRate 0.0160 Epoch: 12 Global Step: 149100 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:38:49,056-Speed 3265.14 samples/sec Loss 2.5349 LearningRate 0.0160 Epoch: 12 Global Step: 149110 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:38:52,217-Speed 3240.65 samples/sec Loss 2.5276 LearningRate 0.0160 Epoch: 12 Global Step: 149120 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:38:55,324-Speed 3296.72 samples/sec Loss 2.5013 LearningRate 0.0160 Epoch: 12 Global Step: 149130 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:38:58,403-Speed 3327.63 samples/sec Loss 2.5715 LearningRate 0.0160 Epoch: 12 Global Step: 149140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:39:01,485-Speed 3323.12 samples/sec Loss 2.5366 LearningRate 0.0160 Epoch: 12 Global Step: 149150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:39:04,618-Speed 3270.05 samples/sec Loss 2.5781 LearningRate 0.0160 Epoch: 12 Global Step: 149160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:39:07,934-Speed 3089.70 samples/sec Loss 2.4740 LearningRate 0.0160 Epoch: 12 Global Step: 149170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:39:11,019-Speed 3319.79 samples/sec Loss 2.5162 LearningRate 0.0160 Epoch: 12 Global Step: 149180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:39:14,113-Speed 3310.90 samples/sec Loss 2.5539 LearningRate 0.0160 Epoch: 12 Global Step: 149190 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:39:17,287-Speed 3227.05 samples/sec Loss 2.5289 LearningRate 0.0160 Epoch: 12 Global Step: 149200 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:39:20,363-Speed 3329.45 samples/sec Loss 2.6049 LearningRate 0.0159 Epoch: 12 Global Step: 149210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:39:23,492-Speed 3274.01 samples/sec Loss 2.5852 LearningRate 0.0159 Epoch: 12 Global Step: 149220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:39:26,540-Speed 3360.83 samples/sec Loss 2.5555 LearningRate 0.0159 Epoch: 12 Global Step: 149230 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:39:29,625-Speed 3320.63 samples/sec Loss 2.4744 LearningRate 0.0159 Epoch: 12 Global Step: 149240 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:39:32,683-Speed 3349.53 samples/sec Loss 2.6112 LearningRate 0.0159 Epoch: 12 Global Step: 149250 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:39:35,845-Speed 3239.36 samples/sec Loss 2.5254 LearningRate 0.0159 Epoch: 12 Global Step: 149260 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:39:38,979-Speed 3268.30 samples/sec Loss 2.5550 LearningRate 0.0159 Epoch: 12 Global Step: 149270 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:39:42,097-Speed 3285.59 samples/sec Loss 2.5683 LearningRate 0.0159 Epoch: 12 Global Step: 149280 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:39:45,296-Speed 3201.34 samples/sec Loss 2.6110 LearningRate 0.0159 Epoch: 12 Global Step: 149290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:39:48,495-Speed 3201.91 samples/sec Loss 2.5136 LearningRate 0.0159 Epoch: 12 Global Step: 149300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:39:51,641-Speed 3256.16 samples/sec Loss 2.5668 LearningRate 0.0159 Epoch: 12 Global Step: 149310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:39:54,956-Speed 3090.40 samples/sec Loss 2.5980 LearningRate 0.0159 Epoch: 12 Global Step: 149320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:39:58,041-Speed 3320.72 samples/sec Loss 2.5242 LearningRate 0.0159 Epoch: 12 Global Step: 149330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:01,220-Speed 3221.94 samples/sec Loss 2.5081 LearningRate 0.0159 Epoch: 12 Global Step: 149340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:04,352-Speed 3270.25 samples/sec Loss 2.5137 LearningRate 0.0159 Epoch: 12 Global Step: 149350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:07,474-Speed 3280.87 samples/sec Loss 2.6445 LearningRate 0.0159 Epoch: 12 Global Step: 149360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:10,566-Speed 3313.36 samples/sec Loss 2.5727 LearningRate 0.0159 Epoch: 12 Global Step: 149370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:13,669-Speed 3300.46 samples/sec Loss 2.5159 LearningRate 0.0159 Epoch: 12 Global Step: 149380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:16,836-Speed 3234.65 samples/sec Loss 2.5346 LearningRate 0.0159 Epoch: 12 Global Step: 149390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 14:40:19,973-Speed 3265.54 samples/sec Loss 2.5757 LearningRate 0.0159 Epoch: 12 Global Step: 149400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:23,060-Speed 3317.38 samples/sec Loss 2.5638 LearningRate 0.0159 Epoch: 12 Global Step: 149410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:26,164-Speed 3300.58 samples/sec Loss 2.6100 LearningRate 0.0159 Epoch: 12 Global Step: 149420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:29,331-Speed 3234.70 samples/sec Loss 2.6836 LearningRate 0.0159 Epoch: 12 Global Step: 149430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:32,461-Speed 3272.88 samples/sec Loss 2.5942 LearningRate 0.0159 Epoch: 12 Global Step: 149440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:35,563-Speed 3302.08 samples/sec Loss 2.5539 LearningRate 0.0159 Epoch: 12 Global Step: 149450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:38,717-Speed 3248.02 samples/sec Loss 2.6428 LearningRate 0.0159 Epoch: 12 Global Step: 149460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:41,802-Speed 3320.36 samples/sec Loss 2.5981 LearningRate 0.0159 Epoch: 12 Global Step: 149470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:40:44,914-Speed 3291.43 samples/sec Loss 2.4691 LearningRate 0.0159 Epoch: 12 Global Step: 149480 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:40:47,997-Speed 3322.93 samples/sec Loss 2.6118 LearningRate 0.0159 Epoch: 12 Global Step: 149490 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:40:51,101-Speed 3300.52 samples/sec Loss 2.5127 LearningRate 0.0159 Epoch: 12 Global Step: 149500 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:40:54,270-Speed 3232.07 samples/sec Loss 2.5873 LearningRate 0.0159 Epoch: 12 Global Step: 149510 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:40:57,361-Speed 3314.38 samples/sec Loss 2.5743 LearningRate 0.0158 Epoch: 12 Global Step: 149520 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:00,447-Speed 3318.68 samples/sec Loss 2.5986 LearningRate 0.0158 Epoch: 12 Global Step: 149530 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:03,579-Speed 3270.73 samples/sec Loss 2.6285 LearningRate 0.0158 Epoch: 12 Global Step: 149540 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:06,721-Speed 3260.06 samples/sec Loss 2.6537 LearningRate 0.0158 Epoch: 12 Global Step: 149550 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:09,793-Speed 3334.52 samples/sec Loss 2.5708 LearningRate 0.0158 Epoch: 12 Global Step: 149560 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:12,909-Speed 3287.28 samples/sec Loss 2.6036 LearningRate 0.0158 Epoch: 12 Global Step: 149570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:16,042-Speed 3269.62 samples/sec Loss 2.6144 LearningRate 0.0158 Epoch: 12 Global Step: 149580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:41:19,113-Speed 3336.16 samples/sec Loss 2.4931 LearningRate 0.0158 Epoch: 12 Global Step: 149590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:22,212-Speed 3304.85 samples/sec Loss 2.6585 LearningRate 0.0158 Epoch: 12 Global Step: 149600 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:25,385-Speed 3228.91 samples/sec Loss 2.5799 LearningRate 0.0158 Epoch: 12 Global Step: 149610 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:28,487-Speed 3301.21 samples/sec Loss 2.6443 LearningRate 0.0158 Epoch: 12 Global Step: 149620 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:31,633-Speed 3255.72 samples/sec Loss 2.5573 LearningRate 0.0158 Epoch: 12 Global Step: 149630 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:34,764-Speed 3272.52 samples/sec Loss 2.6034 LearningRate 0.0158 Epoch: 12 Global Step: 149640 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:37,932-Speed 3233.08 samples/sec Loss 2.6587 LearningRate 0.0158 Epoch: 12 Global Step: 149650 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:41,106-Speed 3226.95 samples/sec Loss 2.6124 LearningRate 0.0158 Epoch: 12 Global Step: 149660 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:44,187-Speed 3324.55 samples/sec Loss 2.5559 LearningRate 0.0158 Epoch: 12 Global Step: 149670 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:47,273-Speed 3319.84 samples/sec Loss 2.6015 LearningRate 0.0158 Epoch: 12 Global Step: 149680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:50,334-Speed 3346.45 samples/sec Loss 2.6087 LearningRate 0.0158 Epoch: 12 Global Step: 149690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:41:53,419-Speed 3320.39 samples/sec Loss 2.5735 LearningRate 0.0158 Epoch: 12 Global Step: 149700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:56,527-Speed 3295.48 samples/sec Loss 2.5861 LearningRate 0.0158 Epoch: 12 Global Step: 149710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:41:59,647-Speed 3283.19 samples/sec Loss 2.6347 LearningRate 0.0158 Epoch: 12 Global Step: 149720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:02,759-Speed 3291.04 samples/sec Loss 2.6256 LearningRate 0.0158 Epoch: 12 Global Step: 149730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:05,876-Speed 3286.44 samples/sec Loss 2.7044 LearningRate 0.0158 Epoch: 12 Global Step: 149740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:08,963-Speed 3318.08 samples/sec Loss 2.4721 LearningRate 0.0158 Epoch: 12 Global Step: 149750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:12,036-Speed 3333.91 samples/sec Loss 2.5965 LearningRate 0.0158 Epoch: 12 Global Step: 149760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:15,190-Speed 3247.42 samples/sec Loss 2.6915 LearningRate 0.0158 Epoch: 12 Global Step: 149770 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:18,337-Speed 3254.38 samples/sec Loss 2.7093 LearningRate 0.0158 Epoch: 12 Global Step: 149780 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:21,427-Speed 3315.53 samples/sec Loss 2.6494 LearningRate 0.0158 Epoch: 12 Global Step: 149790 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:24,543-Speed 3287.23 samples/sec Loss 2.5196 LearningRate 0.0158 Epoch: 12 Global Step: 149800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:42:27,689-Speed 3255.32 samples/sec Loss 2.6350 LearningRate 0.0158 Epoch: 12 Global Step: 149810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:42:30,817-Speed 3275.20 samples/sec Loss 2.6363 LearningRate 0.0158 Epoch: 12 Global Step: 149820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 14:42:33,891-Speed 3331.97 samples/sec Loss 2.6200 LearningRate 0.0158 Epoch: 12 Global Step: 149830 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:36,983-Speed 3313.10 samples/sec Loss 2.5637 LearningRate 0.0157 Epoch: 12 Global Step: 149840 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:40,059-Speed 3330.12 samples/sec Loss 2.6433 LearningRate 0.0157 Epoch: 12 Global Step: 149850 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:43,159-Speed 3304.34 samples/sec Loss 2.6481 LearningRate 0.0157 Epoch: 12 Global Step: 149860 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:46,274-Speed 3288.40 samples/sec Loss 2.6866 LearningRate 0.0157 Epoch: 12 Global Step: 149870 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:49,365-Speed 3313.23 samples/sec Loss 2.6151 LearningRate 0.0157 Epoch: 12 Global Step: 149880 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:52,490-Speed 3278.33 samples/sec Loss 2.6177 LearningRate 0.0157 Epoch: 12 Global Step: 149890 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:55,570-Speed 3325.53 samples/sec Loss 2.6366 LearningRate 0.0157 Epoch: 12 Global Step: 149900 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 14:42:58,682-Speed 3291.38 samples/sec Loss 2.7103 LearningRate 0.0157 Epoch: 12 Global Step: 149910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:43:01,787-Speed 3298.65 samples/sec Loss 2.6489 LearningRate 0.0157 Epoch: 12 Global Step: 149920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:43:04,911-Speed 3279.68 samples/sec Loss 2.6033 LearningRate 0.0157 Epoch: 12 Global Step: 149930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:43:08,014-Speed 3301.18 samples/sec Loss 2.6772 LearningRate 0.0157 Epoch: 12 Global Step: 149940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:43:11,109-Speed 3309.38 samples/sec Loss 2.6661 LearningRate 0.0157 Epoch: 12 Global Step: 149950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:43:14,252-Speed 3258.30 samples/sec Loss 2.6317 LearningRate 0.0157 Epoch: 12 Global Step: 149960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:43:17,381-Speed 3273.46 samples/sec Loss 2.6774 LearningRate 0.0157 Epoch: 12 Global Step: 149970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:43:20,463-Speed 3323.60 samples/sec Loss 2.6760 LearningRate 0.0157 Epoch: 12 Global Step: 149980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:43:23,513-Speed 3359.23 samples/sec Loss 2.6869 LearningRate 0.0157 Epoch: 12 Global Step: 149990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:43:26,665-Speed 3249.55 samples/sec Loss 2.5819 LearningRate 0.0157 Epoch: 12 Global Step: 150000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:43:29,808-Speed 3258.79 samples/sec Loss 2.6161 LearningRate 0.0157 Epoch: 12 Global Step: 150010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:43:32,883-Speed 3330.62 samples/sec Loss 2.7070 LearningRate 0.0157 Epoch: 12 Global Step: 150020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:43:36,014-Speed 3271.59 samples/sec Loss 2.6663 LearningRate 0.0157 Epoch: 12 Global Step: 150030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 14:43:39,047-Speed 3377.31 samples/sec Loss 2.6590 LearningRate 0.0157 Epoch: 12 Global Step: 150040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:43:42,119-Speed 3334.56 samples/sec Loss 2.6953 LearningRate 0.0157 Epoch: 12 Global Step: 150050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:43:45,202-Speed 3323.43 samples/sec Loss 2.6492 LearningRate 0.0157 Epoch: 12 Global Step: 150060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:43:48,344-Speed 3259.66 samples/sec Loss 2.6914 LearningRate 0.0157 Epoch: 12 Global Step: 150070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:43:51,416-Speed 3334.12 samples/sec Loss 2.6900 LearningRate 0.0157 Epoch: 12 Global Step: 150080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:43:54,518-Speed 3302.25 samples/sec Loss 2.6367 LearningRate 0.0157 Epoch: 12 Global Step: 150090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:43:57,574-Speed 3351.17 samples/sec Loss 2.6168 LearningRate 0.0157 Epoch: 12 Global Step: 150100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:44:00,685-Speed 3292.80 samples/sec Loss 2.5910 LearningRate 0.0157 Epoch: 12 Global Step: 150110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:44:03,867-Speed 3219.24 samples/sec Loss 2.6644 LearningRate 0.0157 Epoch: 12 Global Step: 150120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:44:06,971-Speed 3299.49 samples/sec Loss 2.6166 LearningRate 0.0157 Epoch: 12 Global Step: 150130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:44:10,067-Speed 3309.42 samples/sec Loss 2.6781 LearningRate 0.0157 Epoch: 12 Global Step: 150140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:44:13,141-Speed 3331.92 samples/sec Loss 2.6399 LearningRate 0.0156 Epoch: 12 Global Step: 150150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:44:16,207-Speed 3341.17 samples/sec Loss 2.6609 LearningRate 0.0156 Epoch: 12 Global Step: 150160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:44:19,313-Speed 3297.53 samples/sec Loss 2.6798 LearningRate 0.0156 Epoch: 12 Global Step: 150170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:44:22,380-Speed 3339.74 samples/sec Loss 2.6897 LearningRate 0.0156 Epoch: 12 Global Step: 150180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:44:25,488-Speed 3295.93 samples/sec Loss 2.6415 LearningRate 0.0156 Epoch: 12 Global Step: 150190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:44:28,647-Speed 3242.55 samples/sec Loss 2.7288 LearningRate 0.0156 Epoch: 12 Global Step: 150200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:44:31,754-Speed 3296.87 samples/sec Loss 2.7309 LearningRate 0.0156 Epoch: 12 Global Step: 150210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:44:34,817-Speed 3343.99 samples/sec Loss 2.6415 LearningRate 0.0156 Epoch: 12 Global Step: 150220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:44:37,897-Speed 3325.34 samples/sec Loss 2.6819 LearningRate 0.0156 Epoch: 12 Global Step: 150230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:44:40,952-Speed 3353.73 samples/sec Loss 2.6713 LearningRate 0.0156 Epoch: 12 Global Step: 150240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 14:44:44,000-Speed 3360.38 samples/sec Loss 2.6607 LearningRate 0.0156 Epoch: 12 Global Step: 150250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:44:47,059-Speed 3348.37 samples/sec Loss 2.6771 LearningRate 0.0156 Epoch: 12 Global Step: 150260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:44:50,124-Speed 3342.06 samples/sec Loss 2.6976 LearningRate 0.0156 Epoch: 12 Global Step: 150270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:44:53,248-Speed 3278.70 samples/sec Loss 2.6186 LearningRate 0.0156 Epoch: 12 Global Step: 150280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:44:56,347-Speed 3305.70 samples/sec Loss 2.6836 LearningRate 0.0156 Epoch: 12 Global Step: 150290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:44:59,421-Speed 3332.01 samples/sec Loss 2.7303 LearningRate 0.0156 Epoch: 12 Global Step: 150300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:02,491-Speed 3337.15 samples/sec Loss 2.6757 LearningRate 0.0156 Epoch: 12 Global Step: 150310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:05,630-Speed 3262.85 samples/sec Loss 2.7367 LearningRate 0.0156 Epoch: 12 Global Step: 150320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:08,697-Speed 3340.41 samples/sec Loss 2.6820 LearningRate 0.0156 Epoch: 12 Global Step: 150330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:11,750-Speed 3354.06 samples/sec Loss 2.6855 LearningRate 0.0156 Epoch: 12 Global Step: 150340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:14,877-Speed 3276.15 samples/sec Loss 2.6796 LearningRate 0.0156 Epoch: 12 Global Step: 150350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:17,940-Speed 3344.82 samples/sec Loss 2.7146 LearningRate 0.0156 Epoch: 12 Global Step: 150360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:45:21,036-Speed 3307.47 samples/sec Loss 2.5886 LearningRate 0.0156 Epoch: 12 Global Step: 150370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:24,157-Speed 3282.27 samples/sec Loss 2.7368 LearningRate 0.0156 Epoch: 12 Global Step: 150380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:27,310-Speed 3249.08 samples/sec Loss 2.7463 LearningRate 0.0156 Epoch: 12 Global Step: 150390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:30,430-Speed 3282.98 samples/sec Loss 2.6857 LearningRate 0.0156 Epoch: 12 Global Step: 150400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:33,516-Speed 3318.91 samples/sec Loss 2.6745 LearningRate 0.0156 Epoch: 12 Global Step: 150410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:36,703-Speed 3214.30 samples/sec Loss 2.6376 LearningRate 0.0156 Epoch: 12 Global Step: 150420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:39,845-Speed 3260.00 samples/sec Loss 2.6830 LearningRate 0.0156 Epoch: 12 Global Step: 150430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:42,973-Speed 3275.05 samples/sec Loss 2.7450 LearningRate 0.0156 Epoch: 12 Global Step: 150440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:46,023-Speed 3357.89 samples/sec Loss 2.6900 LearningRate 0.0156 Epoch: 12 Global Step: 150450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:49,163-Speed 3261.97 samples/sec Loss 2.7346 LearningRate 0.0155 Epoch: 12 Global Step: 150460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:45:52,329-Speed 3236.30 samples/sec Loss 2.7585 LearningRate 0.0155 Epoch: 12 Global Step: 150470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:45:55,406-Speed 3328.50 samples/sec Loss 2.7572 LearningRate 0.0155 Epoch: 12 Global Step: 150480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:45:58,495-Speed 3315.90 samples/sec Loss 2.7672 LearningRate 0.0155 Epoch: 12 Global Step: 150490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:46:01,595-Speed 3304.07 samples/sec Loss 2.6643 LearningRate 0.0155 Epoch: 12 Global Step: 150500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:46:04,695-Speed 3305.28 samples/sec Loss 2.6892 LearningRate 0.0155 Epoch: 12 Global Step: 150510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:46:07,783-Speed 3316.19 samples/sec Loss 2.7306 LearningRate 0.0155 Epoch: 12 Global Step: 150520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:46:10,844-Speed 3346.61 samples/sec Loss 2.7456 LearningRate 0.0155 Epoch: 12 Global Step: 150530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:46:13,958-Speed 3289.47 samples/sec Loss 2.7137 LearningRate 0.0155 Epoch: 12 Global Step: 150540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:46:17,045-Speed 3318.87 samples/sec Loss 2.7146 LearningRate 0.0155 Epoch: 12 Global Step: 150550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:46:20,140-Speed 3309.16 samples/sec Loss 2.7906 LearningRate 0.0155 Epoch: 12 Global Step: 150560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:46:23,250-Speed 3294.24 samples/sec Loss 2.8062 LearningRate 0.0155 Epoch: 12 Global Step: 150570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:46:26,352-Speed 3301.68 samples/sec Loss 2.7162 LearningRate 0.0155 Epoch: 12 Global Step: 150580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:46:29,475-Speed 3279.46 samples/sec Loss 2.7647 LearningRate 0.0155 Epoch: 12 Global Step: 150590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:46:32,556-Speed 3325.22 samples/sec Loss 2.7347 LearningRate 0.0155 Epoch: 12 Global Step: 150600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:46:35,632-Speed 3329.81 samples/sec Loss 2.6943 LearningRate 0.0155 Epoch: 12 Global Step: 150610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:46:38,809-Speed 3223.71 samples/sec Loss 2.6821 LearningRate 0.0155 Epoch: 12 Global Step: 150620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:46:41,907-Speed 3307.23 samples/sec Loss 2.7812 LearningRate 0.0155 Epoch: 12 Global Step: 150630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:46:44,977-Speed 3336.43 samples/sec Loss 2.7545 LearningRate 0.0155 Epoch: 12 Global Step: 150640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:46:48,046-Speed 3337.74 samples/sec Loss 2.7288 LearningRate 0.0155 Epoch: 12 Global Step: 150650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:46:51,121-Speed 3331.71 samples/sec Loss 2.6410 LearningRate 0.0155 Epoch: 12 Global Step: 150660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:46:54,189-Speed 3338.03 samples/sec Loss 2.7580 LearningRate 0.0155 Epoch: 12 Global Step: 150670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:46:57,323-Speed 3268.75 samples/sec Loss 2.7448 LearningRate 0.0155 Epoch: 12 Global Step: 150680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:47:00,442-Speed 3283.79 samples/sec Loss 2.7118 LearningRate 0.0155 Epoch: 12 Global Step: 150690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:47:03,498-Speed 3351.84 samples/sec Loss 2.8109 LearningRate 0.0155 Epoch: 12 Global Step: 150700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:06,645-Speed 3255.19 samples/sec Loss 2.7326 LearningRate 0.0155 Epoch: 12 Global Step: 150710 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:09,717-Speed 3333.89 samples/sec Loss 2.7203 LearningRate 0.0155 Epoch: 12 Global Step: 150720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:12,786-Speed 3337.62 samples/sec Loss 2.7439 LearningRate 0.0155 Epoch: 12 Global Step: 150730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:15,936-Speed 3251.21 samples/sec Loss 2.7696 LearningRate 0.0155 Epoch: 12 Global Step: 150740 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:19,069-Speed 3270.04 samples/sec Loss 2.7102 LearningRate 0.0155 Epoch: 12 Global Step: 150750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:22,133-Speed 3342.69 samples/sec Loss 2.6517 LearningRate 0.0155 Epoch: 12 Global Step: 150760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:25,202-Speed 3337.54 samples/sec Loss 2.7920 LearningRate 0.0155 Epoch: 12 Global Step: 150770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:28,267-Speed 3342.35 samples/sec Loss 2.7834 LearningRate 0.0154 Epoch: 12 Global Step: 150780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:31,399-Speed 3270.02 samples/sec Loss 2.7909 LearningRate 0.0154 Epoch: 12 Global Step: 150790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:34,483-Speed 3321.49 samples/sec Loss 2.7882 LearningRate 0.0154 Epoch: 12 Global Step: 150800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:47:37,690-Speed 3194.65 samples/sec Loss 2.7365 LearningRate 0.0154 Epoch: 12 Global Step: 150810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:47:40,763-Speed 3332.10 samples/sec Loss 2.8269 LearningRate 0.0154 Epoch: 12 Global Step: 150820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:43,859-Speed 3309.41 samples/sec Loss 2.7007 LearningRate 0.0154 Epoch: 12 Global Step: 150830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:46,958-Speed 3304.98 samples/sec Loss 2.7947 LearningRate 0.0154 Epoch: 12 Global Step: 150840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:50,031-Speed 3333.66 samples/sec Loss 2.8343 LearningRate 0.0154 Epoch: 12 Global Step: 150850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:53,139-Speed 3295.95 samples/sec Loss 2.7608 LearningRate 0.0154 Epoch: 12 Global Step: 150860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:56,196-Speed 3350.15 samples/sec Loss 2.7407 LearningRate 0.0154 Epoch: 12 Global Step: 150870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:47:59,265-Speed 3337.79 samples/sec Loss 2.6811 LearningRate 0.0154 Epoch: 12 Global Step: 150880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:02,458-Speed 3208.01 samples/sec Loss 2.7063 LearningRate 0.0154 Epoch: 12 Global Step: 150890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:05,594-Speed 3266.43 samples/sec Loss 2.7254 LearningRate 0.0154 Epoch: 12 Global Step: 150900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:08,685-Speed 3313.64 samples/sec Loss 2.8105 LearningRate 0.0154 Epoch: 12 Global Step: 150910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:11,760-Speed 3332.25 samples/sec Loss 2.8219 LearningRate 0.0154 Epoch: 12 Global Step: 150920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:48:14,844-Speed 3320.87 samples/sec Loss 2.8186 LearningRate 0.0154 Epoch: 12 Global Step: 150930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:48:17,960-Speed 3288.17 samples/sec Loss 2.7296 LearningRate 0.0154 Epoch: 12 Global Step: 150940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:48:21,032-Speed 3334.53 samples/sec Loss 2.7844 LearningRate 0.0154 Epoch: 12 Global Step: 150950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:24,152-Speed 3282.45 samples/sec Loss 2.7369 LearningRate 0.0154 Epoch: 12 Global Step: 150960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:27,272-Speed 3282.88 samples/sec Loss 2.7471 LearningRate 0.0154 Epoch: 12 Global Step: 150970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:30,352-Speed 3325.43 samples/sec Loss 2.7592 LearningRate 0.0154 Epoch: 12 Global Step: 150980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:33,424-Speed 3335.77 samples/sec Loss 2.7644 LearningRate 0.0154 Epoch: 12 Global Step: 150990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:36,572-Speed 3253.52 samples/sec Loss 2.7028 LearningRate 0.0154 Epoch: 12 Global Step: 151000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:39,820-Speed 3153.62 samples/sec Loss 2.7388 LearningRate 0.0154 Epoch: 12 Global Step: 151010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:43,030-Speed 3191.12 samples/sec Loss 2.7671 LearningRate 0.0154 Epoch: 12 Global Step: 151020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:46,117-Speed 3317.72 samples/sec Loss 2.7714 LearningRate 0.0154 Epoch: 12 Global Step: 151030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:49,261-Speed 3258.83 samples/sec Loss 2.7298 LearningRate 0.0154 Epoch: 12 Global Step: 151040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:48:52,314-Speed 3355.06 samples/sec Loss 2.7434 LearningRate 0.0154 Epoch: 12 Global Step: 151050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:48:55,399-Speed 3319.89 samples/sec Loss 2.7747 LearningRate 0.0154 Epoch: 12 Global Step: 151060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:48:58,455-Speed 3351.67 samples/sec Loss 2.7736 LearningRate 0.0154 Epoch: 12 Global Step: 151070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:49:01,522-Speed 3339.46 samples/sec Loss 2.8147 LearningRate 0.0154 Epoch: 12 Global Step: 151080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:49:04,626-Speed 3300.84 samples/sec Loss 2.7160 LearningRate 0.0154 Epoch: 12 Global Step: 151090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:49:07,724-Speed 3306.60 samples/sec Loss 2.8317 LearningRate 0.0153 Epoch: 12 Global Step: 151100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:49:10,787-Speed 3343.97 samples/sec Loss 2.7945 LearningRate 0.0153 Epoch: 12 Global Step: 151110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:49:13,870-Speed 3323.11 samples/sec Loss 2.8330 LearningRate 0.0153 Epoch: 12 Global Step: 151120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:16,987-Speed 3286.06 samples/sec Loss 2.7947 LearningRate 0.0153 Epoch: 12 Global Step: 151130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:20,116-Speed 3272.88 samples/sec Loss 2.7584 LearningRate 0.0153 Epoch: 12 Global Step: 151140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:23,171-Speed 3353.46 samples/sec Loss 2.8014 LearningRate 0.0153 Epoch: 12 Global Step: 151150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:26,257-Speed 3318.78 samples/sec Loss 2.7524 LearningRate 0.0153 Epoch: 12 Global Step: 151160 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:29,368-Speed 3292.81 samples/sec Loss 2.8136 LearningRate 0.0153 Epoch: 12 Global Step: 151170 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:32,448-Speed 3325.78 samples/sec Loss 2.7045 LearningRate 0.0153 Epoch: 12 Global Step: 151180 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:35,529-Speed 3325.45 samples/sec Loss 2.7350 LearningRate 0.0153 Epoch: 12 Global Step: 151190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:38,590-Speed 3345.47 samples/sec Loss 2.8659 LearningRate 0.0153 Epoch: 12 Global Step: 151200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:41,680-Speed 3315.03 samples/sec Loss 2.8041 LearningRate 0.0153 Epoch: 12 Global Step: 151210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:44,779-Speed 3306.28 samples/sec Loss 2.7221 LearningRate 0.0153 Epoch: 12 Global Step: 151220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:47,952-Speed 3228.32 samples/sec Loss 2.7872 LearningRate 0.0153 Epoch: 12 Global Step: 151230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:51,076-Speed 3278.89 samples/sec Loss 2.7516 LearningRate 0.0153 Epoch: 12 Global Step: 151240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:54,178-Speed 3301.33 samples/sec Loss 2.8309 LearningRate 0.0153 Epoch: 12 Global Step: 151250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:49:57,288-Speed 3294.00 samples/sec Loss 2.7767 LearningRate 0.0153 Epoch: 12 Global Step: 151260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:00,355-Speed 3339.79 samples/sec Loss 2.8046 LearningRate 0.0153 Epoch: 12 Global Step: 151270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:03,445-Speed 3315.13 samples/sec Loss 2.7404 LearningRate 0.0153 Epoch: 12 Global Step: 151280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:06,660-Speed 3186.26 samples/sec Loss 2.8527 LearningRate 0.0153 Epoch: 12 Global Step: 151290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:09,758-Speed 3306.59 samples/sec Loss 2.7181 LearningRate 0.0153 Epoch: 12 Global Step: 151300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:12,970-Speed 3188.73 samples/sec Loss 2.7001 LearningRate 0.0153 Epoch: 12 Global Step: 151310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:16,074-Speed 3300.29 samples/sec Loss 2.8185 LearningRate 0.0153 Epoch: 12 Global Step: 151320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:50:19,201-Speed 3275.93 samples/sec Loss 2.7857 LearningRate 0.0153 Epoch: 12 Global Step: 151330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:50:22,270-Speed 3337.37 samples/sec Loss 2.7680 LearningRate 0.0153 Epoch: 12 Global Step: 151340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:25,398-Speed 3275.68 samples/sec Loss 2.6848 LearningRate 0.0153 Epoch: 12 Global Step: 151350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:28,493-Speed 3309.13 samples/sec Loss 2.8073 LearningRate 0.0153 Epoch: 12 Global Step: 151360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:31,659-Speed 3235.86 samples/sec Loss 2.8033 LearningRate 0.0153 Epoch: 12 Global Step: 151370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:34,740-Speed 3324.83 samples/sec Loss 2.7544 LearningRate 0.0153 Epoch: 12 Global Step: 151380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:37,908-Speed 3233.12 samples/sec Loss 2.8288 LearningRate 0.0153 Epoch: 12 Global Step: 151390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:40,971-Speed 3344.16 samples/sec Loss 2.8278 LearningRate 0.0153 Epoch: 12 Global Step: 151400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:44,056-Speed 3320.29 samples/sec Loss 2.8177 LearningRate 0.0152 Epoch: 12 Global Step: 151410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:47,153-Speed 3307.71 samples/sec Loss 2.7446 LearningRate 0.0152 Epoch: 12 Global Step: 151420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:50,244-Speed 3313.16 samples/sec Loss 2.7975 LearningRate 0.0152 Epoch: 12 Global Step: 151430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:50:53,346-Speed 3302.61 samples/sec Loss 2.8657 LearningRate 0.0152 Epoch: 12 Global Step: 151440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:50:56,492-Speed 3256.05 samples/sec Loss 2.8971 LearningRate 0.0152 Epoch: 12 Global Step: 151450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:50:59,654-Speed 3239.28 samples/sec Loss 2.7105 LearningRate 0.0152 Epoch: 12 Global Step: 151460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:51:02,782-Speed 3274.89 samples/sec Loss 2.7323 LearningRate 0.0152 Epoch: 12 Global Step: 151470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:51:05,902-Speed 3286.75 samples/sec Loss 2.7840 LearningRate 0.0152 Epoch: 12 Global Step: 151480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:09,028-Speed 3276.69 samples/sec Loss 2.8493 LearningRate 0.0152 Epoch: 12 Global Step: 151490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:12,231-Speed 3198.41 samples/sec Loss 2.7735 LearningRate 0.0152 Epoch: 12 Global Step: 151500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:15,473-Speed 3159.62 samples/sec Loss 2.8021 LearningRate 0.0152 Epoch: 12 Global Step: 151510 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:18,584-Speed 3292.02 samples/sec Loss 2.7553 LearningRate 0.0152 Epoch: 12 Global Step: 151520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:21,702-Speed 3285.25 samples/sec Loss 2.7916 LearningRate 0.0152 Epoch: 12 Global Step: 151530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:24,810-Speed 3296.34 samples/sec Loss 2.8522 LearningRate 0.0152 Epoch: 12 Global Step: 151540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:27,895-Speed 3319.72 samples/sec Loss 2.8100 LearningRate 0.0152 Epoch: 12 Global Step: 151550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:31,042-Speed 3255.79 samples/sec Loss 2.8394 LearningRate 0.0152 Epoch: 12 Global Step: 151560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:34,146-Speed 3299.21 samples/sec Loss 2.8748 LearningRate 0.0152 Epoch: 12 Global Step: 151570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:37,267-Speed 3282.97 samples/sec Loss 2.8152 LearningRate 0.0152 Epoch: 12 Global Step: 151580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:51:40,376-Speed 3294.04 samples/sec Loss 2.8151 LearningRate 0.0152 Epoch: 12 Global Step: 151590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:43,459-Speed 3322.95 samples/sec Loss 2.8647 LearningRate 0.0152 Epoch: 12 Global Step: 151600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:46,542-Speed 3322.01 samples/sec Loss 2.8133 LearningRate 0.0152 Epoch: 12 Global Step: 151610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:49,635-Speed 3312.43 samples/sec Loss 2.7827 LearningRate 0.0152 Epoch: 12 Global Step: 151620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:52,764-Speed 3273.96 samples/sec Loss 2.7472 LearningRate 0.0152 Epoch: 12 Global Step: 151630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:55,907-Speed 3258.29 samples/sec Loss 2.8479 LearningRate 0.0152 Epoch: 12 Global Step: 151640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:51:59,045-Speed 3264.73 samples/sec Loss 2.8554 LearningRate 0.0152 Epoch: 12 Global Step: 151650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:52:02,120-Speed 3331.76 samples/sec Loss 2.8107 LearningRate 0.0152 Epoch: 12 Global Step: 151660 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:52:05,208-Speed 3317.39 samples/sec Loss 2.8768 LearningRate 0.0152 Epoch: 12 Global Step: 151670 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:52:08,275-Speed 3339.45 samples/sec Loss 2.8795 LearningRate 0.0152 Epoch: 12 Global Step: 151680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:52:11,363-Speed 3317.51 samples/sec Loss 2.8168 LearningRate 0.0152 Epoch: 12 Global Step: 151690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:52:14,548-Speed 3215.65 samples/sec Loss 2.8336 LearningRate 0.0152 Epoch: 12 Global Step: 151700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:52:17,646-Speed 3306.94 samples/sec Loss 2.7873 LearningRate 0.0152 Epoch: 12 Global Step: 151710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:52:20,703-Speed 3350.16 samples/sec Loss 2.7423 LearningRate 0.0152 Epoch: 12 Global Step: 151720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:52:23,862-Speed 3242.58 samples/sec Loss 2.8730 LearningRate 0.0151 Epoch: 12 Global Step: 151730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:52:26,993-Speed 3271.69 samples/sec Loss 2.8236 LearningRate 0.0151 Epoch: 12 Global Step: 151740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:52:30,096-Speed 3301.17 samples/sec Loss 2.8298 LearningRate 0.0151 Epoch: 12 Global Step: 151750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:52:33,181-Speed 3320.53 samples/sec Loss 2.7870 LearningRate 0.0151 Epoch: 12 Global Step: 151760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:52:36,356-Speed 3226.35 samples/sec Loss 2.8319 LearningRate 0.0151 Epoch: 12 Global Step: 151770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:52:39,469-Speed 3290.03 samples/sec Loss 2.8168 LearningRate 0.0151 Epoch: 12 Global Step: 151780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:52:42,560-Speed 3313.55 samples/sec Loss 2.7873 LearningRate 0.0151 Epoch: 12 Global Step: 151790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:52:45,638-Speed 3328.49 samples/sec Loss 2.7881 LearningRate 0.0151 Epoch: 12 Global Step: 151800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:52:48,738-Speed 3303.89 samples/sec Loss 2.8609 LearningRate 0.0151 Epoch: 12 Global Step: 151810 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:52:51,916-Speed 3223.55 samples/sec Loss 2.8688 LearningRate 0.0151 Epoch: 12 Global Step: 151820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:52:55,002-Speed 3319.49 samples/sec Loss 2.7685 LearningRate 0.0151 Epoch: 12 Global Step: 151830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:52:58,060-Speed 3349.50 samples/sec Loss 2.8403 LearningRate 0.0151 Epoch: 12 Global Step: 151840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:01,230-Speed 3230.73 samples/sec Loss 2.8537 LearningRate 0.0151 Epoch: 12 Global Step: 151850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:04,351-Speed 3282.13 samples/sec Loss 2.7659 LearningRate 0.0151 Epoch: 12 Global Step: 151860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:07,435-Speed 3322.46 samples/sec Loss 2.8397 LearningRate 0.0151 Epoch: 12 Global Step: 151870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:53:10,547-Speed 3291.06 samples/sec Loss 2.8588 LearningRate 0.0151 Epoch: 12 Global Step: 151880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:53:13,627-Speed 3326.02 samples/sec Loss 2.8311 LearningRate 0.0151 Epoch: 12 Global Step: 151890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:53:16,700-Speed 3332.64 samples/sec Loss 2.8452 LearningRate 0.0151 Epoch: 12 Global Step: 151900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:53:19,775-Speed 3331.57 samples/sec Loss 2.8033 LearningRate 0.0151 Epoch: 12 Global Step: 151910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:53:22,842-Speed 3339.49 samples/sec Loss 2.8459 LearningRate 0.0151 Epoch: 12 Global Step: 151920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:25,964-Speed 3280.89 samples/sec Loss 2.8802 LearningRate 0.0151 Epoch: 12 Global Step: 151930 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:29,066-Speed 3302.86 samples/sec Loss 2.8591 LearningRate 0.0151 Epoch: 12 Global Step: 151940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:32,163-Speed 3307.45 samples/sec Loss 2.8708 LearningRate 0.0151 Epoch: 12 Global Step: 151950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:35,311-Speed 3253.61 samples/sec Loss 2.8916 LearningRate 0.0151 Epoch: 12 Global Step: 151960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:38,405-Speed 3310.36 samples/sec Loss 2.8598 LearningRate 0.0151 Epoch: 12 Global Step: 151970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:41,483-Speed 3327.65 samples/sec Loss 2.8656 LearningRate 0.0151 Epoch: 12 Global Step: 151980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:44,545-Speed 3344.74 samples/sec Loss 2.8375 LearningRate 0.0151 Epoch: 12 Global Step: 151990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:47,626-Speed 3325.20 samples/sec Loss 2.8375 LearningRate 0.0151 Epoch: 12 Global Step: 152000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:50,733-Speed 3297.40 samples/sec Loss 2.8227 LearningRate 0.0151 Epoch: 12 Global Step: 152010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:53:53,885-Speed 3248.67 samples/sec Loss 2.7921 LearningRate 0.0151 Epoch: 12 Global Step: 152020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:53:56,965-Speed 3325.76 samples/sec Loss 2.8187 LearningRate 0.0151 Epoch: 12 Global Step: 152030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:54:00,092-Speed 3276.26 samples/sec Loss 2.8135 LearningRate 0.0151 Epoch: 12 Global Step: 152040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:54:03,139-Speed 3361.30 samples/sec Loss 2.8861 LearningRate 0.0150 Epoch: 12 Global Step: 152050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:06,306-Speed 3234.78 samples/sec Loss 2.7861 LearningRate 0.0150 Epoch: 12 Global Step: 152060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:09,369-Speed 3344.50 samples/sec Loss 2.8452 LearningRate 0.0150 Epoch: 12 Global Step: 152070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:12,526-Speed 3244.24 samples/sec Loss 2.8664 LearningRate 0.0150 Epoch: 12 Global Step: 152080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:15,702-Speed 3224.41 samples/sec Loss 2.8529 LearningRate 0.0150 Epoch: 12 Global Step: 152090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:18,816-Speed 3289.74 samples/sec Loss 2.8739 LearningRate 0.0150 Epoch: 12 Global Step: 152100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:21,914-Speed 3306.10 samples/sec Loss 2.9168 LearningRate 0.0150 Epoch: 12 Global Step: 152110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:24,984-Speed 3336.28 samples/sec Loss 2.8630 LearningRate 0.0150 Epoch: 12 Global Step: 152120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:28,093-Speed 3295.48 samples/sec Loss 2.8410 LearningRate 0.0150 Epoch: 12 Global Step: 152130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:31,265-Speed 3229.19 samples/sec Loss 2.8154 LearningRate 0.0150 Epoch: 12 Global Step: 152140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:34,364-Speed 3304.75 samples/sec Loss 2.8646 LearningRate 0.0150 Epoch: 12 Global Step: 152150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:54:37,514-Speed 3252.26 samples/sec Loss 2.8917 LearningRate 0.0150 Epoch: 12 Global Step: 152160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:54:40,653-Speed 3262.94 samples/sec Loss 2.8929 LearningRate 0.0150 Epoch: 12 Global Step: 152170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:54:43,724-Speed 3336.05 samples/sec Loss 2.8105 LearningRate 0.0150 Epoch: 12 Global Step: 152180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:54:46,831-Speed 3296.42 samples/sec Loss 2.8633 LearningRate 0.0150 Epoch: 12 Global Step: 152190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:54:49,897-Speed 3341.45 samples/sec Loss 2.8343 LearningRate 0.0150 Epoch: 12 Global Step: 152200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:53,008-Speed 3292.42 samples/sec Loss 2.8224 LearningRate 0.0150 Epoch: 12 Global Step: 152210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:56,083-Speed 3330.61 samples/sec Loss 2.8445 LearningRate 0.0150 Epoch: 12 Global Step: 152220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:54:59,168-Speed 3320.87 samples/sec Loss 2.9045 LearningRate 0.0150 Epoch: 12 Global Step: 152230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:02,285-Speed 3285.59 samples/sec Loss 2.8623 LearningRate 0.0150 Epoch: 12 Global Step: 152240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:05,374-Speed 3316.49 samples/sec Loss 2.9289 LearningRate 0.0150 Epoch: 12 Global Step: 152250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:08,491-Speed 3286.33 samples/sec Loss 2.8978 LearningRate 0.0150 Epoch: 12 Global Step: 152260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:11,575-Speed 3321.37 samples/sec Loss 2.8032 LearningRate 0.0150 Epoch: 12 Global Step: 152270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:14,675-Speed 3304.62 samples/sec Loss 2.8512 LearningRate 0.0150 Epoch: 12 Global Step: 152280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:17,845-Speed 3231.14 samples/sec Loss 2.9156 LearningRate 0.0150 Epoch: 12 Global Step: 152290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:20,964-Speed 3283.54 samples/sec Loss 2.8940 LearningRate 0.0150 Epoch: 12 Global Step: 152300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:55:24,111-Speed 3255.93 samples/sec Loss 2.8199 LearningRate 0.0150 Epoch: 12 Global Step: 152310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:55:27,225-Speed 3288.86 samples/sec Loss 2.8588 LearningRate 0.0150 Epoch: 12 Global Step: 152320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:55:30,300-Speed 3330.86 samples/sec Loss 2.8145 LearningRate 0.0150 Epoch: 12 Global Step: 152330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:55:33,394-Speed 3310.81 samples/sec Loss 2.9065 LearningRate 0.0150 Epoch: 12 Global Step: 152340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:55:36,480-Speed 3318.67 samples/sec Loss 2.9017 LearningRate 0.0150 Epoch: 12 Global Step: 152350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:39,589-Speed 3295.04 samples/sec Loss 2.9460 LearningRate 0.0150 Epoch: 12 Global Step: 152360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:42,749-Speed 3242.25 samples/sec Loss 2.9261 LearningRate 0.0149 Epoch: 12 Global Step: 152370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:45,872-Speed 3279.05 samples/sec Loss 2.8994 LearningRate 0.0149 Epoch: 12 Global Step: 152380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:49,062-Speed 3210.92 samples/sec Loss 2.9056 LearningRate 0.0149 Epoch: 12 Global Step: 152390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:52,202-Speed 3262.09 samples/sec Loss 2.8751 LearningRate 0.0149 Epoch: 12 Global Step: 152400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:55,287-Speed 3320.21 samples/sec Loss 2.8892 LearningRate 0.0149 Epoch: 12 Global Step: 152410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:55:58,362-Speed 3331.50 samples/sec Loss 2.8440 LearningRate 0.0149 Epoch: 12 Global Step: 152420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:01,508-Speed 3255.92 samples/sec Loss 2.8760 LearningRate 0.0149 Epoch: 12 Global Step: 152430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:04,695-Speed 3213.94 samples/sec Loss 2.8338 LearningRate 0.0149 Epoch: 12 Global Step: 152440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:08,375-Speed 2783.67 samples/sec Loss 2.8552 LearningRate 0.0149 Epoch: 12 Global Step: 152450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:56:11,483-Speed 3294.84 samples/sec Loss 2.8585 LearningRate 0.0149 Epoch: 12 Global Step: 152460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:56:14,654-Speed 3230.81 samples/sec Loss 2.8308 LearningRate 0.0149 Epoch: 12 Global Step: 152470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:56:17,715-Speed 3346.49 samples/sec Loss 2.8852 LearningRate 0.0149 Epoch: 12 Global Step: 152480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:20,836-Speed 3281.36 samples/sec Loss 2.8870 LearningRate 0.0149 Epoch: 12 Global Step: 152490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:23,964-Speed 3274.73 samples/sec Loss 2.9026 LearningRate 0.0149 Epoch: 12 Global Step: 152500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:27,163-Speed 3202.08 samples/sec Loss 2.9009 LearningRate 0.0149 Epoch: 12 Global Step: 152510 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:30,265-Speed 3302.48 samples/sec Loss 2.9002 LearningRate 0.0149 Epoch: 12 Global Step: 152520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:33,334-Speed 3336.60 samples/sec Loss 2.8216 LearningRate 0.0149 Epoch: 12 Global Step: 152530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:36,484-Speed 3251.69 samples/sec Loss 2.9389 LearningRate 0.0149 Epoch: 12 Global Step: 152540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:39,588-Speed 3300.20 samples/sec Loss 2.9181 LearningRate 0.0149 Epoch: 12 Global Step: 152550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:42,742-Speed 3247.75 samples/sec Loss 2.9402 LearningRate 0.0149 Epoch: 12 Global Step: 152560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:45,850-Speed 3296.23 samples/sec Loss 2.9247 LearningRate 0.0149 Epoch: 12 Global Step: 152570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:48,966-Speed 3287.17 samples/sec Loss 2.9352 LearningRate 0.0149 Epoch: 12 Global Step: 152580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:56:52,135-Speed 3231.99 samples/sec Loss 2.8381 LearningRate 0.0149 Epoch: 12 Global Step: 152590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:55,241-Speed 3297.36 samples/sec Loss 2.8303 LearningRate 0.0149 Epoch: 12 Global Step: 152600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:56:58,384-Speed 3259.42 samples/sec Loss 2.9340 LearningRate 0.0149 Epoch: 12 Global Step: 152610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:57:01,500-Speed 3287.35 samples/sec Loss 2.9464 LearningRate 0.0149 Epoch: 12 Global Step: 152620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:57:04,676-Speed 3225.18 samples/sec Loss 2.7991 LearningRate 0.0149 Epoch: 12 Global Step: 152630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:57:07,777-Speed 3302.95 samples/sec Loss 2.8849 LearningRate 0.0149 Epoch: 12 Global Step: 152640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:57:10,874-Speed 3307.72 samples/sec Loss 2.9127 LearningRate 0.0149 Epoch: 12 Global Step: 152650 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:57:14,001-Speed 3275.06 samples/sec Loss 2.9224 LearningRate 0.0149 Epoch: 12 Global Step: 152660 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:57:17,143-Speed 3261.16 samples/sec Loss 2.8330 LearningRate 0.0149 Epoch: 12 Global Step: 152670 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:57:20,219-Speed 3329.36 samples/sec Loss 2.8876 LearningRate 0.0149 Epoch: 12 Global Step: 152680 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:57:23,300-Speed 3325.17 samples/sec Loss 2.8786 LearningRate 0.0148 Epoch: 12 Global Step: 152690 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:57:26,454-Speed 3247.47 samples/sec Loss 2.9300 LearningRate 0.0148 Epoch: 12 Global Step: 152700 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:57:29,633-Speed 3222.08 samples/sec Loss 2.9057 LearningRate 0.0148 Epoch: 12 Global Step: 152710 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:57:32,781-Speed 3254.09 samples/sec Loss 2.8804 LearningRate 0.0148 Epoch: 12 Global Step: 152720 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:57:35,955-Speed 3227.46 samples/sec Loss 2.9216 LearningRate 0.0148 Epoch: 12 Global Step: 152730 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:57:39,071-Speed 3286.60 samples/sec Loss 2.8949 LearningRate 0.0148 Epoch: 12 Global Step: 152740 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:57:42,830-Speed 2725.14 samples/sec Loss 2.8981 LearningRate 0.0148 Epoch: 12 Global Step: 152750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:57:45,921-Speed 3314.54 samples/sec Loss 2.9200 LearningRate 0.0148 Epoch: 12 Global Step: 152760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:57:49,152-Speed 3170.04 samples/sec Loss 2.8974 LearningRate 0.0148 Epoch: 12 Global Step: 152770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:57:53,583-Speed 2311.31 samples/sec Loss 2.8456 LearningRate 0.0148 Epoch: 12 Global Step: 152780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:57:56,661-Speed 3327.94 samples/sec Loss 2.9114 LearningRate 0.0148 Epoch: 12 Global Step: 152790 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:57:59,766-Speed 3299.02 samples/sec Loss 2.9025 LearningRate 0.0148 Epoch: 12 Global Step: 152800 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:58:02,903-Speed 3265.28 samples/sec Loss 2.9040 LearningRate 0.0148 Epoch: 12 Global Step: 152810 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:58:06,020-Speed 3286.63 samples/sec Loss 2.8727 LearningRate 0.0148 Epoch: 12 Global Step: 152820 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:58:09,132-Speed 3291.02 samples/sec Loss 2.8479 LearningRate 0.0148 Epoch: 12 Global Step: 152830 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:58:12,293-Speed 3240.89 samples/sec Loss 2.9218 LearningRate 0.0148 Epoch: 12 Global Step: 152840 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:58:15,437-Speed 3258.45 samples/sec Loss 2.8570 LearningRate 0.0148 Epoch: 12 Global Step: 152850 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:58:18,528-Speed 3313.17 samples/sec Loss 2.9177 LearningRate 0.0148 Epoch: 12 Global Step: 152860 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:58:21,634-Speed 3297.34 samples/sec Loss 2.9247 LearningRate 0.0148 Epoch: 12 Global Step: 152870 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:58:24,722-Speed 3318.14 samples/sec Loss 2.9624 LearningRate 0.0148 Epoch: 12 Global Step: 152880 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 14:58:27,845-Speed 3279.14 samples/sec Loss 2.9230 LearningRate 0.0148 Epoch: 12 Global Step: 152890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:58:31,065-Speed 3181.75 samples/sec Loss 2.9267 LearningRate 0.0148 Epoch: 12 Global Step: 152900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:58:34,140-Speed 3331.45 samples/sec Loss 2.9326 LearningRate 0.0148 Epoch: 12 Global Step: 152910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:58:37,361-Speed 3180.15 samples/sec Loss 2.8416 LearningRate 0.0148 Epoch: 12 Global Step: 152920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:58:40,480-Speed 3283.34 samples/sec Loss 2.8491 LearningRate 0.0148 Epoch: 12 Global Step: 152930 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:58:43,649-Speed 3232.21 samples/sec Loss 2.9422 LearningRate 0.0148 Epoch: 12 Global Step: 152940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:58:46,741-Speed 3313.08 samples/sec Loss 2.9659 LearningRate 0.0148 Epoch: 12 Global Step: 152950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:58:49,843-Speed 3302.55 samples/sec Loss 2.8722 LearningRate 0.0148 Epoch: 12 Global Step: 152960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:58:52,956-Speed 3290.60 samples/sec Loss 2.9684 LearningRate 0.0148 Epoch: 12 Global Step: 152970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:58:56,063-Speed 3296.92 samples/sec Loss 2.9615 LearningRate 0.0148 Epoch: 12 Global Step: 152980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:58:59,192-Speed 3273.81 samples/sec Loss 2.9545 LearningRate 0.0148 Epoch: 12 Global Step: 152990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:59:02,330-Speed 3263.86 samples/sec Loss 2.9204 LearningRate 0.0148 Epoch: 12 Global Step: 153000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:59:05,463-Speed 3269.43 samples/sec Loss 2.9271 LearningRate 0.0148 Epoch: 12 Global Step: 153010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:59:08,537-Speed 3331.73 samples/sec Loss 2.9418 LearningRate 0.0147 Epoch: 12 Global Step: 153020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:59:11,626-Speed 3316.19 samples/sec Loss 2.9584 LearningRate 0.0147 Epoch: 12 Global Step: 153030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:59:14,776-Speed 3251.90 samples/sec Loss 2.8915 LearningRate 0.0147 Epoch: 12 Global Step: 153040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:59:17,878-Speed 3302.47 samples/sec Loss 2.9275 LearningRate 0.0147 Epoch: 12 Global Step: 153050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:59:20,965-Speed 3318.37 samples/sec Loss 2.9173 LearningRate 0.0147 Epoch: 12 Global Step: 153060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:59:24,049-Speed 3321.40 samples/sec Loss 2.9011 LearningRate 0.0147 Epoch: 12 Global Step: 153070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:59:27,195-Speed 3255.74 samples/sec Loss 2.9432 LearningRate 0.0147 Epoch: 12 Global Step: 153080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:59:30,326-Speed 3270.96 samples/sec Loss 2.8278 LearningRate 0.0147 Epoch: 12 Global Step: 153090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:59:33,403-Speed 3329.34 samples/sec Loss 2.8420 LearningRate 0.0147 Epoch: 12 Global Step: 153100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:59:36,566-Speed 3238.96 samples/sec Loss 2.9706 LearningRate 0.0147 Epoch: 12 Global Step: 153110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:59:39,737-Speed 3229.54 samples/sec Loss 2.9533 LearningRate 0.0147 Epoch: 12 Global Step: 153120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:59:42,837-Speed 3304.35 samples/sec Loss 2.9831 LearningRate 0.0147 Epoch: 12 Global Step: 153130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:59:45,938-Speed 3304.21 samples/sec Loss 2.9093 LearningRate 0.0147 Epoch: 12 Global Step: 153140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:59:49,138-Speed 3200.28 samples/sec Loss 2.9607 LearningRate 0.0147 Epoch: 12 Global Step: 153150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:59:52,259-Speed 3282.59 samples/sec Loss 2.9433 LearningRate 0.0147 Epoch: 12 Global Step: 153160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 14:59:55,326-Speed 3339.33 samples/sec Loss 2.9568 LearningRate 0.0147 Epoch: 12 Global Step: 153170 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 14:59:58,422-Speed 3308.78 samples/sec Loss 2.9759 LearningRate 0.0147 Epoch: 12 Global Step: 153180 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:00:01,589-Speed 3234.94 samples/sec Loss 2.9355 LearningRate 0.0147 Epoch: 12 Global Step: 153190 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:00:04,821-Speed 3169.76 samples/sec Loss 2.9075 LearningRate 0.0147 Epoch: 12 Global Step: 153200 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:00:07,927-Speed 3298.02 samples/sec Loss 2.9691 LearningRate 0.0147 Epoch: 12 Global Step: 153210 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:00:10,987-Speed 3347.40 samples/sec Loss 2.8259 LearningRate 0.0147 Epoch: 12 Global Step: 153220 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:00:14,163-Speed 3225.51 samples/sec Loss 2.9224 LearningRate 0.0147 Epoch: 12 Global Step: 153230 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:00:17,316-Speed 3248.57 samples/sec Loss 2.8922 LearningRate 0.0147 Epoch: 12 Global Step: 153240 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:00:20,470-Speed 3247.42 samples/sec Loss 2.9800 LearningRate 0.0147 Epoch: 12 Global Step: 153250 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:00:23,657-Speed 3214.20 samples/sec Loss 2.9169 LearningRate 0.0147 Epoch: 12 Global Step: 153260 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:00:26,780-Speed 3280.97 samples/sec Loss 2.9580 LearningRate 0.0147 Epoch: 12 Global Step: 153270 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:00:29,920-Speed 3261.73 samples/sec Loss 2.9266 LearningRate 0.0147 Epoch: 12 Global Step: 153280 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:00:33,034-Speed 3289.31 samples/sec Loss 2.9285 LearningRate 0.0147 Epoch: 12 Global Step: 153290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:00:36,199-Speed 3235.61 samples/sec Loss 2.9152 LearningRate 0.0147 Epoch: 12 Global Step: 153300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:00:39,327-Speed 3275.42 samples/sec Loss 2.8791 LearningRate 0.0147 Epoch: 12 Global Step: 153310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:00:42,424-Speed 3307.10 samples/sec Loss 2.9532 LearningRate 0.0147 Epoch: 12 Global Step: 153320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:00:45,492-Speed 3338.96 samples/sec Loss 2.8396 LearningRate 0.0147 Epoch: 12 Global Step: 153330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:00:48,641-Speed 3252.75 samples/sec Loss 2.9231 LearningRate 0.0146 Epoch: 12 Global Step: 153340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:00:51,764-Speed 3280.01 samples/sec Loss 2.9754 LearningRate 0.0146 Epoch: 12 Global Step: 153350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:00:54,890-Speed 3276.67 samples/sec Loss 2.9531 LearningRate 0.0146 Epoch: 12 Global Step: 153360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:00:57,943-Speed 3355.74 samples/sec Loss 2.9948 LearningRate 0.0146 Epoch: 12 Global Step: 153370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:01:01,095-Speed 3248.79 samples/sec Loss 2.8603 LearningRate 0.0146 Epoch: 12 Global Step: 153380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:01:04,302-Speed 3194.03 samples/sec Loss 2.9355 LearningRate 0.0146 Epoch: 12 Global Step: 153390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:07,436-Speed 3268.26 samples/sec Loss 3.0195 LearningRate 0.0146 Epoch: 12 Global Step: 153400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:10,549-Speed 3290.73 samples/sec Loss 2.9797 LearningRate 0.0146 Epoch: 12 Global Step: 153410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:13,685-Speed 3266.79 samples/sec Loss 2.8722 LearningRate 0.0146 Epoch: 12 Global Step: 153420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:16,783-Speed 3305.66 samples/sec Loss 2.9318 LearningRate 0.0146 Epoch: 12 Global Step: 153430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:19,926-Speed 3259.80 samples/sec Loss 2.9932 LearningRate 0.0146 Epoch: 12 Global Step: 153440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:23,033-Speed 3296.87 samples/sec Loss 2.9609 LearningRate 0.0146 Epoch: 12 Global Step: 153450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:26,142-Speed 3294.67 samples/sec Loss 2.9183 LearningRate 0.0146 Epoch: 12 Global Step: 153460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:29,270-Speed 3273.89 samples/sec Loss 2.8915 LearningRate 0.0146 Epoch: 12 Global Step: 153470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:32,411-Speed 3262.07 samples/sec Loss 3.0308 LearningRate 0.0146 Epoch: 12 Global Step: 153480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:35,612-Speed 3199.16 samples/sec Loss 2.9090 LearningRate 0.0146 Epoch: 12 Global Step: 153490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 15:01:38,733-Speed 3282.14 samples/sec Loss 2.9843 LearningRate 0.0146 Epoch: 12 Global Step: 153500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:41,855-Speed 3281.51 samples/sec Loss 2.9186 LearningRate 0.0146 Epoch: 12 Global Step: 153510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:44,997-Speed 3260.00 samples/sec Loss 3.0105 LearningRate 0.0146 Epoch: 12 Global Step: 153520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:48,072-Speed 3330.84 samples/sec Loss 2.9865 LearningRate 0.0146 Epoch: 12 Global Step: 153530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:51,210-Speed 3263.82 samples/sec Loss 2.9201 LearningRate 0.0146 Epoch: 12 Global Step: 153540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:54,319-Speed 3295.17 samples/sec Loss 2.9301 LearningRate 0.0146 Epoch: 12 Global Step: 153550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:01:57,375-Speed 3351.81 samples/sec Loss 2.8869 LearningRate 0.0146 Epoch: 12 Global Step: 153560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:00,469-Speed 3311.62 samples/sec Loss 2.9805 LearningRate 0.0146 Epoch: 12 Global Step: 153570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:03,528-Speed 3348.20 samples/sec Loss 2.8514 LearningRate 0.0146 Epoch: 12 Global Step: 153580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:06,637-Speed 3294.37 samples/sec Loss 2.9823 LearningRate 0.0146 Epoch: 12 Global Step: 153590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:09,716-Speed 3326.95 samples/sec Loss 2.9294 LearningRate 0.0146 Epoch: 12 Global Step: 153600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:12,846-Speed 3272.43 samples/sec Loss 2.9485 LearningRate 0.0146 Epoch: 12 Global Step: 153610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:15,925-Speed 3326.78 samples/sec Loss 2.8968 LearningRate 0.0146 Epoch: 12 Global Step: 153620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:19,056-Speed 3272.13 samples/sec Loss 2.9100 LearningRate 0.0146 Epoch: 12 Global Step: 153630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:22,130-Speed 3331.69 samples/sec Loss 2.9403 LearningRate 0.0146 Epoch: 12 Global Step: 153640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:25,292-Speed 3239.25 samples/sec Loss 2.9585 LearningRate 0.0146 Epoch: 12 Global Step: 153650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:28,375-Speed 3323.16 samples/sec Loss 2.9125 LearningRate 0.0146 Epoch: 12 Global Step: 153660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:02:31,471-Speed 3307.65 samples/sec Loss 3.0047 LearningRate 0.0145 Epoch: 12 Global Step: 153670 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:34,550-Speed 3326.94 samples/sec Loss 2.9273 LearningRate 0.0145 Epoch: 12 Global Step: 153680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:37,701-Speed 3250.90 samples/sec Loss 2.9971 LearningRate 0.0145 Epoch: 12 Global Step: 153690 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:40,889-Speed 3212.89 samples/sec Loss 2.9366 LearningRate 0.0145 Epoch: 12 Global Step: 153700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:43,968-Speed 3326.73 samples/sec Loss 2.9213 LearningRate 0.0145 Epoch: 12 Global Step: 153710 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:47,066-Speed 3306.55 samples/sec Loss 2.9635 LearningRate 0.0145 Epoch: 12 Global Step: 153720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:50,176-Speed 3294.11 samples/sec Loss 3.0042 LearningRate 0.0145 Epoch: 12 Global Step: 153730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:53,331-Speed 3246.42 samples/sec Loss 2.9675 LearningRate 0.0145 Epoch: 12 Global Step: 153740 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:56,457-Speed 3276.77 samples/sec Loss 2.9444 LearningRate 0.0145 Epoch: 12 Global Step: 153750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:02:59,534-Speed 3329.44 samples/sec Loss 2.9039 LearningRate 0.0145 Epoch: 12 Global Step: 153760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:03:02,639-Speed 3298.28 samples/sec Loss 2.9209 LearningRate 0.0145 Epoch: 12 Global Step: 153770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:03:05,705-Speed 3340.82 samples/sec Loss 2.8491 LearningRate 0.0145 Epoch: 12 Global Step: 153780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:03:08,808-Speed 3301.88 samples/sec Loss 2.9485 LearningRate 0.0145 Epoch: 12 Global Step: 153790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:03:11,932-Speed 3278.30 samples/sec Loss 3.0031 LearningRate 0.0145 Epoch: 12 Global Step: 153800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:03:15,075-Speed 3259.46 samples/sec Loss 2.9722 LearningRate 0.0145 Epoch: 12 Global Step: 153810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:03:18,130-Speed 3353.01 samples/sec Loss 2.9764 LearningRate 0.0145 Epoch: 12 Global Step: 153820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:03:21,177-Speed 3361.30 samples/sec Loss 2.9410 LearningRate 0.0145 Epoch: 12 Global Step: 153830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:03:24,242-Speed 3342.14 samples/sec Loss 2.9326 LearningRate 0.0145 Epoch: 12 Global Step: 153840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:03:27,348-Speed 3297.84 samples/sec Loss 3.0026 LearningRate 0.0145 Epoch: 12 Global Step: 153850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:03:30,496-Speed 3254.15 samples/sec Loss 3.0166 LearningRate 0.0145 Epoch: 12 Global Step: 153860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:03:33,663-Speed 3234.59 samples/sec Loss 2.9555 LearningRate 0.0145 Epoch: 12 Global Step: 153870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:03:36,832-Speed 3232.58 samples/sec Loss 2.9790 LearningRate 0.0145 Epoch: 12 Global Step: 153880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:03:39,948-Speed 3287.14 samples/sec Loss 2.8587 LearningRate 0.0145 Epoch: 12 Global Step: 153890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:03:43,084-Speed 3265.62 samples/sec Loss 2.9901 LearningRate 0.0145 Epoch: 12 Global Step: 153900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:03:46,157-Speed 3333.69 samples/sec Loss 2.9952 LearningRate 0.0145 Epoch: 12 Global Step: 153910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:03:49,256-Speed 3305.74 samples/sec Loss 3.0375 LearningRate 0.0145 Epoch: 12 Global Step: 153920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:03:52,385-Speed 3273.42 samples/sec Loss 2.9296 LearningRate 0.0145 Epoch: 12 Global Step: 153930 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:03:55,562-Speed 3223.92 samples/sec Loss 2.9149 LearningRate 0.0145 Epoch: 12 Global Step: 153940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:03:58,652-Speed 3315.62 samples/sec Loss 2.9767 LearningRate 0.0145 Epoch: 12 Global Step: 153950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:04:01,823-Speed 3229.43 samples/sec Loss 2.9495 LearningRate 0.0145 Epoch: 12 Global Step: 153960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:04:04,911-Speed 3317.23 samples/sec Loss 2.9632 LearningRate 0.0145 Epoch: 12 Global Step: 153970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:04:07,996-Speed 3320.58 samples/sec Loss 2.9969 LearningRate 0.0145 Epoch: 12 Global Step: 153980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:04:11,145-Speed 3253.27 samples/sec Loss 2.9558 LearningRate 0.0144 Epoch: 12 Global Step: 153990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:04:14,220-Speed 3330.43 samples/sec Loss 2.9930 LearningRate 0.0144 Epoch: 12 Global Step: 154000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:04:17,390-Speed 3231.40 samples/sec Loss 2.9950 LearningRate 0.0144 Epoch: 12 Global Step: 154010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:04:20,463-Speed 3333.76 samples/sec Loss 3.0445 LearningRate 0.0144 Epoch: 12 Global Step: 154020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:04:23,596-Speed 3269.01 samples/sec Loss 2.9589 LearningRate 0.0144 Epoch: 12 Global Step: 154030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:04:26,719-Speed 3280.39 samples/sec Loss 2.9654 LearningRate 0.0144 Epoch: 12 Global Step: 154040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:04:29,850-Speed 3271.54 samples/sec Loss 2.9649 LearningRate 0.0144 Epoch: 12 Global Step: 154050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:04:32,928-Speed 3327.90 samples/sec Loss 3.0015 LearningRate 0.0144 Epoch: 12 Global Step: 154060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:04:36,010-Speed 3323.97 samples/sec Loss 3.0082 LearningRate 0.0144 Epoch: 12 Global Step: 154070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:04:39,124-Speed 3288.90 samples/sec Loss 2.9804 LearningRate 0.0144 Epoch: 12 Global Step: 154080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:04:42,291-Speed 3234.10 samples/sec Loss 2.9858 LearningRate 0.0144 Epoch: 12 Global Step: 154090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:04:45,393-Speed 3302.83 samples/sec Loss 2.9917 LearningRate 0.0144 Epoch: 12 Global Step: 154100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:04:48,464-Speed 3334.79 samples/sec Loss 2.9992 LearningRate 0.0144 Epoch: 12 Global Step: 154110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:04:51,536-Speed 3335.22 samples/sec Loss 2.9562 LearningRate 0.0144 Epoch: 12 Global Step: 154120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:04:54,695-Speed 3242.40 samples/sec Loss 3.0040 LearningRate 0.0144 Epoch: 12 Global Step: 154130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:04:57,772-Speed 3328.52 samples/sec Loss 2.8458 LearningRate 0.0144 Epoch: 12 Global Step: 154140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:05:00,904-Speed 3270.52 samples/sec Loss 2.9450 LearningRate 0.0144 Epoch: 12 Global Step: 154150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:05:04,011-Speed 3296.59 samples/sec Loss 2.9397 LearningRate 0.0144 Epoch: 12 Global Step: 154160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:05:07,167-Speed 3246.42 samples/sec Loss 2.9556 LearningRate 0.0144 Epoch: 12 Global Step: 154170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 15:05:10,231-Speed 3343.06 samples/sec Loss 3.0105 LearningRate 0.0144 Epoch: 12 Global Step: 154180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:05:13,317-Speed 3318.82 samples/sec Loss 2.9266 LearningRate 0.0144 Epoch: 12 Global Step: 154190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:05:16,479-Speed 3239.74 samples/sec Loss 2.9310 LearningRate 0.0144 Epoch: 12 Global Step: 154200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:05:19,551-Speed 3334.68 samples/sec Loss 2.9808 LearningRate 0.0144 Epoch: 12 Global Step: 154210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:05:22,626-Speed 3330.69 samples/sec Loss 2.9912 LearningRate 0.0144 Epoch: 12 Global Step: 154220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:05:25,758-Speed 3271.04 samples/sec Loss 2.9737 LearningRate 0.0144 Epoch: 12 Global Step: 154230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:05:28,890-Speed 3269.62 samples/sec Loss 3.0587 LearningRate 0.0144 Epoch: 12 Global Step: 154240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:05:31,998-Speed 3296.07 samples/sec Loss 2.9346 LearningRate 0.0144 Epoch: 12 Global Step: 154250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:05:35,070-Speed 3333.97 samples/sec Loss 2.9805 LearningRate 0.0144 Epoch: 12 Global Step: 154260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:05:38,197-Speed 3276.79 samples/sec Loss 2.9464 LearningRate 0.0144 Epoch: 12 Global Step: 154270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:05:41,307-Speed 3293.70 samples/sec Loss 2.9616 LearningRate 0.0144 Epoch: 12 Global Step: 154280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:05:44,420-Speed 3289.99 samples/sec Loss 2.9312 LearningRate 0.0144 Epoch: 12 Global Step: 154290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:05:47,495-Speed 3331.81 samples/sec Loss 3.0491 LearningRate 0.0144 Epoch: 12 Global Step: 154300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:05:50,627-Speed 3269.56 samples/sec Loss 2.9908 LearningRate 0.0144 Epoch: 12 Global Step: 154310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:05:53,704-Speed 3330.11 samples/sec Loss 2.9494 LearningRate 0.0143 Epoch: 12 Global Step: 154320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:05:56,766-Speed 3344.90 samples/sec Loss 3.0089 LearningRate 0.0143 Epoch: 12 Global Step: 154330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:05:59,922-Speed 3244.91 samples/sec Loss 2.9803 LearningRate 0.0143 Epoch: 12 Global Step: 154340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:03,140-Speed 3183.60 samples/sec Loss 3.0495 LearningRate 0.0143 Epoch: 12 Global Step: 154350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:06,315-Speed 3225.93 samples/sec Loss 3.0155 LearningRate 0.0143 Epoch: 12 Global Step: 154360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:09,387-Speed 3334.60 samples/sec Loss 2.8796 LearningRate 0.0143 Epoch: 12 Global Step: 154370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:12,499-Speed 3291.48 samples/sec Loss 3.0509 LearningRate 0.0143 Epoch: 12 Global Step: 154380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:15,565-Speed 3341.52 samples/sec Loss 2.9178 LearningRate 0.0143 Epoch: 12 Global Step: 154390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:18,614-Speed 3359.20 samples/sec Loss 2.9908 LearningRate 0.0143 Epoch: 12 Global Step: 154400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:21,681-Speed 3339.05 samples/sec Loss 2.9607 LearningRate 0.0143 Epoch: 12 Global Step: 154410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:24,752-Speed 3335.76 samples/sec Loss 3.0076 LearningRate 0.0143 Epoch: 12 Global Step: 154420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:27,825-Speed 3333.16 samples/sec Loss 2.9919 LearningRate 0.0143 Epoch: 12 Global Step: 154430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:31,002-Speed 3224.49 samples/sec Loss 2.9217 LearningRate 0.0143 Epoch: 12 Global Step: 154440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:06:34,090-Speed 3316.87 samples/sec Loss 2.9793 LearningRate 0.0143 Epoch: 12 Global Step: 154450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:06:37,166-Speed 3330.58 samples/sec Loss 2.9411 LearningRate 0.0143 Epoch: 12 Global Step: 154460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:06:40,287-Speed 3281.78 samples/sec Loss 2.9963 LearningRate 0.0143 Epoch: 12 Global Step: 154470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:06:43,462-Speed 3225.97 samples/sec Loss 2.9615 LearningRate 0.0143 Epoch: 12 Global Step: 154480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:46,569-Speed 3296.82 samples/sec Loss 2.9085 LearningRate 0.0143 Epoch: 12 Global Step: 154490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:49,742-Speed 3228.29 samples/sec Loss 2.9251 LearningRate 0.0143 Epoch: 12 Global Step: 154500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:52,866-Speed 3279.11 samples/sec Loss 2.9950 LearningRate 0.0143 Epoch: 12 Global Step: 154510 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:55,947-Speed 3325.04 samples/sec Loss 3.0188 LearningRate 0.0143 Epoch: 12 Global Step: 154520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:06:59,021-Speed 3332.05 samples/sec Loss 2.9529 LearningRate 0.0143 Epoch: 12 Global Step: 154530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:07:02,099-Speed 3328.15 samples/sec Loss 2.9359 LearningRate 0.0143 Epoch: 12 Global Step: 154540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:07:05,235-Speed 3266.28 samples/sec Loss 2.9624 LearningRate 0.0143 Epoch: 12 Global Step: 154550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:07:08,357-Speed 3281.04 samples/sec Loss 3.0264 LearningRate 0.0143 Epoch: 12 Global Step: 154560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:07:11,533-Speed 3225.39 samples/sec Loss 2.9542 LearningRate 0.0143 Epoch: 12 Global Step: 154570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:07:14,695-Speed 3239.53 samples/sec Loss 3.0073 LearningRate 0.0143 Epoch: 12 Global Step: 154580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:17,786-Speed 3313.16 samples/sec Loss 2.9730 LearningRate 0.0143 Epoch: 12 Global Step: 154590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:20,836-Speed 3359.32 samples/sec Loss 3.0456 LearningRate 0.0143 Epoch: 12 Global Step: 154600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:23,962-Speed 3276.42 samples/sec Loss 2.9225 LearningRate 0.0143 Epoch: 12 Global Step: 154610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:27,130-Speed 3232.88 samples/sec Loss 3.0799 LearningRate 0.0143 Epoch: 12 Global Step: 154620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:30,278-Speed 3254.01 samples/sec Loss 2.9830 LearningRate 0.0143 Epoch: 12 Global Step: 154630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:33,392-Speed 3289.23 samples/sec Loss 2.9991 LearningRate 0.0143 Epoch: 12 Global Step: 154640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:36,510-Speed 3285.56 samples/sec Loss 2.9239 LearningRate 0.0142 Epoch: 12 Global Step: 154650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:39,592-Speed 3324.15 samples/sec Loss 2.9790 LearningRate 0.0142 Epoch: 12 Global Step: 154660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:42,656-Speed 3342.55 samples/sec Loss 2.9773 LearningRate 0.0142 Epoch: 12 Global Step: 154670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:45,719-Speed 3344.32 samples/sec Loss 2.9500 LearningRate 0.0142 Epoch: 12 Global Step: 154680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 15:07:48,848-Speed 3274.31 samples/sec Loss 3.0198 LearningRate 0.0142 Epoch: 12 Global Step: 154690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:52,028-Speed 3221.52 samples/sec Loss 3.0230 LearningRate 0.0142 Epoch: 12 Global Step: 154700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:55,142-Speed 3290.38 samples/sec Loss 2.9745 LearningRate 0.0142 Epoch: 12 Global Step: 154710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:07:58,243-Speed 3302.77 samples/sec Loss 2.9453 LearningRate 0.0142 Epoch: 12 Global Step: 154720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:08:01,327-Speed 3321.50 samples/sec Loss 2.9840 LearningRate 0.0142 Epoch: 12 Global Step: 154730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:08:04,430-Speed 3300.85 samples/sec Loss 2.9743 LearningRate 0.0142 Epoch: 12 Global Step: 154740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:08:07,567-Speed 3265.90 samples/sec Loss 2.9667 LearningRate 0.0142 Epoch: 12 Global Step: 154750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:08:10,671-Speed 3299.42 samples/sec Loss 3.1305 LearningRate 0.0142 Epoch: 12 Global Step: 154760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:08:13,790-Speed 3284.26 samples/sec Loss 2.9557 LearningRate 0.0142 Epoch: 12 Global Step: 154770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:08:16,931-Speed 3261.72 samples/sec Loss 3.0122 LearningRate 0.0142 Epoch: 12 Global Step: 154780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:08:19,957-Speed 3384.51 samples/sec Loss 2.9701 LearningRate 0.0142 Epoch: 12 Global Step: 154790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:23,043-Speed 3319.39 samples/sec Loss 2.9412 LearningRate 0.0142 Epoch: 12 Global Step: 154800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:26,108-Speed 3342.01 samples/sec Loss 2.9527 LearningRate 0.0142 Epoch: 12 Global Step: 154810 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:29,254-Speed 3255.53 samples/sec Loss 2.9865 LearningRate 0.0142 Epoch: 12 Global Step: 154820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:32,327-Speed 3333.18 samples/sec Loss 2.9437 LearningRate 0.0142 Epoch: 12 Global Step: 154830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:35,439-Speed 3291.66 samples/sec Loss 2.9572 LearningRate 0.0142 Epoch: 12 Global Step: 154840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:38,576-Speed 3265.83 samples/sec Loss 3.0052 LearningRate 0.0142 Epoch: 12 Global Step: 154850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:41,775-Speed 3201.89 samples/sec Loss 2.9817 LearningRate 0.0142 Epoch: 12 Global Step: 154860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:44,856-Speed 3324.10 samples/sec Loss 3.0127 LearningRate 0.0142 Epoch: 12 Global Step: 154870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:47,952-Speed 3309.04 samples/sec Loss 2.9882 LearningRate 0.0142 Epoch: 12 Global Step: 154880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:51,138-Speed 3214.49 samples/sec Loss 2.9687 LearningRate 0.0142 Epoch: 12 Global Step: 154890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:54,337-Speed 3202.02 samples/sec Loss 3.0027 LearningRate 0.0142 Epoch: 12 Global Step: 154900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:08:57,441-Speed 3300.11 samples/sec Loss 2.9884 LearningRate 0.0142 Epoch: 12 Global Step: 154910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:09:00,554-Speed 3290.62 samples/sec Loss 3.0310 LearningRate 0.0142 Epoch: 12 Global Step: 154920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:09:03,675-Speed 3281.45 samples/sec Loss 2.9620 LearningRate 0.0142 Epoch: 12 Global Step: 154930 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:09:06,852-Speed 3224.74 samples/sec Loss 2.9438 LearningRate 0.0142 Epoch: 12 Global Step: 154940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:09:09,949-Speed 3306.63 samples/sec Loss 3.0027 LearningRate 0.0142 Epoch: 12 Global Step: 154950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:09:13,109-Speed 3242.09 samples/sec Loss 3.1157 LearningRate 0.0142 Epoch: 12 Global Step: 154960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:09:16,235-Speed 3276.59 samples/sec Loss 3.0397 LearningRate 0.0142 Epoch: 12 Global Step: 154970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:09:19,344-Speed 3294.58 samples/sec Loss 2.9535 LearningRate 0.0141 Epoch: 12 Global Step: 154980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:09:22,431-Speed 3318.28 samples/sec Loss 3.0820 LearningRate 0.0141 Epoch: 12 Global Step: 154990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:09:25,615-Speed 3217.29 samples/sec Loss 3.0154 LearningRate 0.0141 Epoch: 12 Global Step: 155000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:09:28,753-Speed 3264.09 samples/sec Loss 2.9807 LearningRate 0.0141 Epoch: 12 Global Step: 155010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:09:31,849-Speed 3309.36 samples/sec Loss 3.0629 LearningRate 0.0141 Epoch: 12 Global Step: 155020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:09:34,977-Speed 3273.90 samples/sec Loss 2.9895 LearningRate 0.0141 Epoch: 12 Global Step: 155030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:09:38,119-Speed 3260.28 samples/sec Loss 2.9804 LearningRate 0.0141 Epoch: 12 Global Step: 155040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:09:41,248-Speed 3273.38 samples/sec Loss 2.9143 LearningRate 0.0141 Epoch: 12 Global Step: 155050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:09:44,354-Speed 3298.22 samples/sec Loss 2.9577 LearningRate 0.0141 Epoch: 12 Global Step: 155060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:09:47,482-Speed 3274.52 samples/sec Loss 3.0744 LearningRate 0.0141 Epoch: 12 Global Step: 155070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:09:50,619-Speed 3265.51 samples/sec Loss 2.9752 LearningRate 0.0141 Epoch: 12 Global Step: 155080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:09:53,725-Speed 3297.77 samples/sec Loss 3.0156 LearningRate 0.0141 Epoch: 12 Global Step: 155090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 15:09:56,838-Speed 3290.23 samples/sec Loss 2.9799 LearningRate 0.0141 Epoch: 12 Global Step: 155100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:09:59,971-Speed 3270.04 samples/sec Loss 3.0071 LearningRate 0.0141 Epoch: 12 Global Step: 155110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:10:03,134-Speed 3238.35 samples/sec Loss 2.9687 LearningRate 0.0141 Epoch: 12 Global Step: 155120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:10:06,245-Speed 3291.83 samples/sec Loss 3.0389 LearningRate 0.0141 Epoch: 12 Global Step: 155130 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:10:09,316-Speed 3335.48 samples/sec Loss 3.1038 LearningRate 0.0141 Epoch: 12 Global Step: 155140 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:10:12,427-Speed 3292.52 samples/sec Loss 3.0266 LearningRate 0.0141 Epoch: 12 Global Step: 155150 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:10:15,638-Speed 3190.46 samples/sec Loss 2.9384 LearningRate 0.0141 Epoch: 12 Global Step: 155160 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:10:18,775-Speed 3264.82 samples/sec Loss 3.0572 LearningRate 0.0141 Epoch: 12 Global Step: 155170 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:10:21,842-Speed 3340.21 samples/sec Loss 3.0619 LearningRate 0.0141 Epoch: 12 Global Step: 155180 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:10:24,990-Speed 3253.76 samples/sec Loss 2.9355 LearningRate 0.0141 Epoch: 12 Global Step: 155190 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:10:28,164-Speed 3227.27 samples/sec Loss 3.0206 LearningRate 0.0141 Epoch: 12 Global Step: 155200 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:10:31,265-Speed 3303.03 samples/sec Loss 2.9834 LearningRate 0.0141 Epoch: 12 Global Step: 155210 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:10:34,375-Speed 3293.61 samples/sec Loss 3.0570 LearningRate 0.0141 Epoch: 12 Global Step: 155220 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:10:37,563-Speed 3213.65 samples/sec Loss 2.9449 LearningRate 0.0141 Epoch: 12 Global Step: 155230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:10:40,677-Speed 3289.86 samples/sec Loss 3.0177 LearningRate 0.0141 Epoch: 12 Global Step: 155240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:10:43,770-Speed 3311.05 samples/sec Loss 3.0328 LearningRate 0.0141 Epoch: 12 Global Step: 155250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:10:46,889-Speed 3284.68 samples/sec Loss 2.9087 LearningRate 0.0141 Epoch: 12 Global Step: 155260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:10:49,951-Speed 3345.02 samples/sec Loss 3.0505 LearningRate 0.0141 Epoch: 12 Global Step: 155270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:10:53,096-Speed 3257.10 samples/sec Loss 3.0360 LearningRate 0.0141 Epoch: 12 Global Step: 155280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:10:56,178-Speed 3323.23 samples/sec Loss 3.0540 LearningRate 0.0141 Epoch: 12 Global Step: 155290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:10:59,269-Speed 3313.93 samples/sec Loss 2.9892 LearningRate 0.0141 Epoch: 12 Global Step: 155300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:02,353-Speed 3322.05 samples/sec Loss 3.1368 LearningRate 0.0140 Epoch: 12 Global Step: 155310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:05,468-Speed 3288.06 samples/sec Loss 2.9602 LearningRate 0.0140 Epoch: 12 Global Step: 155320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:08,538-Speed 3336.25 samples/sec Loss 3.0486 LearningRate 0.0140 Epoch: 12 Global Step: 155330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:11:11,628-Speed 3315.07 samples/sec Loss 2.9756 LearningRate 0.0140 Epoch: 12 Global Step: 155340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:11:14,764-Speed 3265.98 samples/sec Loss 3.0776 LearningRate 0.0140 Epoch: 12 Global Step: 155350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:11:17,889-Speed 3278.33 samples/sec Loss 2.9830 LearningRate 0.0140 Epoch: 12 Global Step: 155360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:20,999-Speed 3294.08 samples/sec Loss 2.9919 LearningRate 0.0140 Epoch: 12 Global Step: 155370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:24,183-Speed 3217.44 samples/sec Loss 3.0058 LearningRate 0.0140 Epoch: 12 Global Step: 155380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:27,363-Speed 3220.89 samples/sec Loss 2.9751 LearningRate 0.0140 Epoch: 12 Global Step: 155390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:30,520-Speed 3243.64 samples/sec Loss 2.9949 LearningRate 0.0140 Epoch: 12 Global Step: 155400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:33,658-Speed 3264.50 samples/sec Loss 3.0445 LearningRate 0.0140 Epoch: 12 Global Step: 155410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:36,827-Speed 3232.19 samples/sec Loss 3.0299 LearningRate 0.0140 Epoch: 12 Global Step: 155420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:39,980-Speed 3249.49 samples/sec Loss 2.9255 LearningRate 0.0140 Epoch: 12 Global Step: 155430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:43,146-Speed 3235.26 samples/sec Loss 2.9540 LearningRate 0.0140 Epoch: 12 Global Step: 155440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:46,236-Speed 3314.51 samples/sec Loss 3.0368 LearningRate 0.0140 Epoch: 12 Global Step: 155450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:49,407-Speed 3230.61 samples/sec Loss 3.0044 LearningRate 0.0140 Epoch: 12 Global Step: 155460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:52,492-Speed 3320.05 samples/sec Loss 3.0476 LearningRate 0.0140 Epoch: 12 Global Step: 155470 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:55,558-Speed 3341.59 samples/sec Loss 2.9460 LearningRate 0.0140 Epoch: 12 Global Step: 155480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:11:58,661-Speed 3301.07 samples/sec Loss 3.0816 LearningRate 0.0140 Epoch: 12 Global Step: 155490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:12:01,885-Speed 3176.78 samples/sec Loss 3.0370 LearningRate 0.0140 Epoch: 12 Global Step: 155500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:12:05,073-Speed 3212.94 samples/sec Loss 3.0198 LearningRate 0.0140 Epoch: 12 Global Step: 155510 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:12:08,175-Speed 3302.39 samples/sec Loss 3.0126 LearningRate 0.0140 Epoch: 12 Global Step: 155520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:12:11,279-Speed 3299.88 samples/sec Loss 2.9875 LearningRate 0.0140 Epoch: 12 Global Step: 155530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:12:14,414-Speed 3266.86 samples/sec Loss 3.0696 LearningRate 0.0140 Epoch: 12 Global Step: 155540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:12:17,510-Speed 3308.35 samples/sec Loss 3.0831 LearningRate 0.0140 Epoch: 12 Global Step: 155550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:12:20,584-Speed 3332.16 samples/sec Loss 2.9327 LearningRate 0.0140 Epoch: 12 Global Step: 155560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:12:23,741-Speed 3244.68 samples/sec Loss 2.9931 LearningRate 0.0140 Epoch: 12 Global Step: 155570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:12:26,835-Speed 3311.11 samples/sec Loss 3.0234 LearningRate 0.0140 Epoch: 12 Global Step: 155580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:12:29,974-Speed 3263.04 samples/sec Loss 2.9754 LearningRate 0.0140 Epoch: 12 Global Step: 155590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:12:33,093-Speed 3284.50 samples/sec Loss 2.9972 LearningRate 0.0140 Epoch: 12 Global Step: 155600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:12:36,218-Speed 3277.21 samples/sec Loss 3.1027 LearningRate 0.0140 Epoch: 12 Global Step: 155610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:12:39,349-Speed 3271.55 samples/sec Loss 2.9652 LearningRate 0.0140 Epoch: 12 Global Step: 155620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:12:42,473-Speed 3278.75 samples/sec Loss 2.9522 LearningRate 0.0140 Epoch: 12 Global Step: 155630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:12:45,564-Speed 3314.07 samples/sec Loss 3.0080 LearningRate 0.0139 Epoch: 12 Global Step: 155640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:12:48,661-Speed 3308.40 samples/sec Loss 3.1039 LearningRate 0.0139 Epoch: 12 Global Step: 155650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:12:51,804-Speed 3258.67 samples/sec Loss 3.0353 LearningRate 0.0139 Epoch: 12 Global Step: 155660 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:12:54,924-Speed 3283.35 samples/sec Loss 3.1765 LearningRate 0.0139 Epoch: 12 Global Step: 155670 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:12:58,025-Speed 3303.15 samples/sec Loss 2.9898 LearningRate 0.0139 Epoch: 12 Global Step: 155680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:13:01,172-Speed 3254.83 samples/sec Loss 3.0516 LearningRate 0.0139 Epoch: 12 Global Step: 155690 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:13:04,250-Speed 3327.59 samples/sec Loss 3.0214 LearningRate 0.0139 Epoch: 12 Global Step: 155700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:13:07,397-Speed 3255.38 samples/sec Loss 3.0638 LearningRate 0.0139 Epoch: 12 Global Step: 155710 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:13:10,485-Speed 3316.45 samples/sec Loss 2.9524 LearningRate 0.0139 Epoch: 12 Global Step: 155720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:13:13,640-Speed 3246.52 samples/sec Loss 3.0508 LearningRate 0.0139 Epoch: 12 Global Step: 155730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:13:16,857-Speed 3184.38 samples/sec Loss 2.9394 LearningRate 0.0139 Epoch: 12 Global Step: 155740 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:13:19,955-Speed 3306.90 samples/sec Loss 2.9718 LearningRate 0.0139 Epoch: 12 Global Step: 155750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:13:23,061-Speed 3297.43 samples/sec Loss 2.9414 LearningRate 0.0139 Epoch: 12 Global Step: 155760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:13:26,221-Speed 3241.09 samples/sec Loss 3.0032 LearningRate 0.0139 Epoch: 12 Global Step: 155770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:13:29,306-Speed 3320.50 samples/sec Loss 2.9883 LearningRate 0.0139 Epoch: 12 Global Step: 155780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:13:32,450-Speed 3258.67 samples/sec Loss 3.1159 LearningRate 0.0139 Epoch: 12 Global Step: 155790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:13:35,525-Speed 3331.26 samples/sec Loss 3.0164 LearningRate 0.0139 Epoch: 12 Global Step: 155800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:13:38,624-Speed 3305.03 samples/sec Loss 3.0745 LearningRate 0.0139 Epoch: 12 Global Step: 155810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:13:41,757-Speed 3269.47 samples/sec Loss 2.9917 LearningRate 0.0139 Epoch: 12 Global Step: 155820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:13:44,831-Speed 3331.75 samples/sec Loss 2.9894 LearningRate 0.0139 Epoch: 12 Global Step: 155830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:13:47,940-Speed 3295.45 samples/sec Loss 3.0470 LearningRate 0.0139 Epoch: 12 Global Step: 155840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:13:51,029-Speed 3315.88 samples/sec Loss 3.0736 LearningRate 0.0139 Epoch: 12 Global Step: 155850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:13:54,122-Speed 3311.97 samples/sec Loss 2.9790 LearningRate 0.0139 Epoch: 12 Global Step: 155860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:13:57,261-Speed 3263.36 samples/sec Loss 3.0256 LearningRate 0.0139 Epoch: 12 Global Step: 155870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:14:00,364-Speed 3300.46 samples/sec Loss 3.0579 LearningRate 0.0139 Epoch: 12 Global Step: 155880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:14:03,512-Speed 3254.32 samples/sec Loss 3.1105 LearningRate 0.0139 Epoch: 12 Global Step: 155890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:14:06,637-Speed 3277.79 samples/sec Loss 2.9544 LearningRate 0.0139 Epoch: 12 Global Step: 155900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:14:09,713-Speed 3329.76 samples/sec Loss 3.0173 LearningRate 0.0139 Epoch: 12 Global Step: 155910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:14:12,780-Speed 3339.80 samples/sec Loss 2.9949 LearningRate 0.0139 Epoch: 12 Global Step: 155920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:14:15,944-Speed 3237.76 samples/sec Loss 2.9933 LearningRate 0.0139 Epoch: 12 Global Step: 155930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:14:19,014-Speed 3336.10 samples/sec Loss 3.0529 LearningRate 0.0139 Epoch: 12 Global Step: 155940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:14:22,075-Speed 3346.80 samples/sec Loss 2.9836 LearningRate 0.0139 Epoch: 12 Global Step: 155950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:14:25,150-Speed 3330.93 samples/sec Loss 3.0319 LearningRate 0.0139 Epoch: 12 Global Step: 155960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:14:28,282-Speed 3269.94 samples/sec Loss 3.0947 LearningRate 0.0138 Epoch: 12 Global Step: 155970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:14:31,401-Speed 3284.59 samples/sec Loss 3.0409 LearningRate 0.0138 Epoch: 12 Global Step: 155980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:14:34,499-Speed 3306.52 samples/sec Loss 3.0108 LearningRate 0.0138 Epoch: 12 Global Step: 155990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:14:37,605-Speed 3297.09 samples/sec Loss 2.9926 LearningRate 0.0138 Epoch: 12 Global Step: 156000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:14:40,714-Speed 3294.61 samples/sec Loss 3.0158 LearningRate 0.0138 Epoch: 12 Global Step: 156010 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:14:43,784-Speed 3337.25 samples/sec Loss 3.0366 LearningRate 0.0138 Epoch: 12 Global Step: 156020 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:14:46,909-Speed 3278.03 samples/sec Loss 3.0815 LearningRate 0.0138 Epoch: 12 Global Step: 156030 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:14:50,077-Speed 3233.44 samples/sec Loss 3.0727 LearningRate 0.0138 Epoch: 12 Global Step: 156040 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:14:53,243-Speed 3234.90 samples/sec Loss 3.1230 LearningRate 0.0138 Epoch: 12 Global Step: 156050 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:14:56,384-Speed 3261.53 samples/sec Loss 3.0106 LearningRate 0.0138 Epoch: 12 Global Step: 156060 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:14:59,534-Speed 3251.56 samples/sec Loss 3.0074 LearningRate 0.0138 Epoch: 12 Global Step: 156070 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:15:02,690-Speed 3245.50 samples/sec Loss 3.0389 LearningRate 0.0138 Epoch: 12 Global Step: 156080 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:15:05,841-Speed 3251.29 samples/sec Loss 2.9783 LearningRate 0.0138 Epoch: 12 Global Step: 156090 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:15:08,964-Speed 3279.67 samples/sec Loss 3.0048 LearningRate 0.0138 Epoch: 12 Global Step: 156100 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:15:12,121-Speed 3243.92 samples/sec Loss 2.9793 LearningRate 0.0138 Epoch: 12 Global Step: 156110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:15,253-Speed 3271.11 samples/sec Loss 3.0197 LearningRate 0.0138 Epoch: 12 Global Step: 156120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:18,431-Speed 3222.96 samples/sec Loss 2.9239 LearningRate 0.0138 Epoch: 12 Global Step: 156130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:21,533-Speed 3302.22 samples/sec Loss 3.0136 LearningRate 0.0138 Epoch: 12 Global Step: 156140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:24,666-Speed 3269.72 samples/sec Loss 3.1071 LearningRate 0.0138 Epoch: 12 Global Step: 156150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:27,767-Speed 3302.79 samples/sec Loss 3.0525 LearningRate 0.0138 Epoch: 12 Global Step: 156160 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:30,875-Speed 3296.14 samples/sec Loss 2.9809 LearningRate 0.0138 Epoch: 12 Global Step: 156170 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:34,003-Speed 3274.05 samples/sec Loss 2.9904 LearningRate 0.0138 Epoch: 12 Global Step: 156180 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:37,141-Speed 3263.71 samples/sec Loss 3.0556 LearningRate 0.0138 Epoch: 12 Global Step: 156190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:40,249-Speed 3296.04 samples/sec Loss 3.0596 LearningRate 0.0138 Epoch: 12 Global Step: 156200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:43,358-Speed 3294.54 samples/sec Loss 3.0367 LearningRate 0.0138 Epoch: 12 Global Step: 156210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:15:46,429-Speed 3335.87 samples/sec Loss 3.0639 LearningRate 0.0138 Epoch: 12 Global Step: 156220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:15:49,533-Speed 3300.00 samples/sec Loss 3.1133 LearningRate 0.0138 Epoch: 12 Global Step: 156230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:52,649-Speed 3287.40 samples/sec Loss 3.1074 LearningRate 0.0138 Epoch: 12 Global Step: 156240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:55,780-Speed 3271.49 samples/sec Loss 2.9592 LearningRate 0.0138 Epoch: 12 Global Step: 156250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:15:58,868-Speed 3317.21 samples/sec Loss 2.9617 LearningRate 0.0138 Epoch: 12 Global Step: 156260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:16:02,016-Speed 3254.05 samples/sec Loss 3.0204 LearningRate 0.0138 Epoch: 12 Global Step: 156270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:16:05,179-Speed 3238.89 samples/sec Loss 3.0850 LearningRate 0.0138 Epoch: 12 Global Step: 156280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:16:08,297-Speed 3284.69 samples/sec Loss 3.0174 LearningRate 0.0138 Epoch: 12 Global Step: 156290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:16:11,381-Speed 3321.24 samples/sec Loss 3.0416 LearningRate 0.0138 Epoch: 12 Global Step: 156300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:16:14,515-Speed 3268.38 samples/sec Loss 2.9782 LearningRate 0.0137 Epoch: 12 Global Step: 156310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:16:17,674-Speed 3243.08 samples/sec Loss 3.0462 LearningRate 0.0137 Epoch: 12 Global Step: 156320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:16:20,787-Speed 3290.03 samples/sec Loss 3.0390 LearningRate 0.0137 Epoch: 12 Global Step: 156330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:16:23,852-Speed 3341.97 samples/sec Loss 3.0555 LearningRate 0.0137 Epoch: 12 Global Step: 156340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:16:27,001-Speed 3253.44 samples/sec Loss 3.0166 LearningRate 0.0137 Epoch: 12 Global Step: 156350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:16:30,089-Speed 3316.97 samples/sec Loss 3.1086 LearningRate 0.0137 Epoch: 12 Global Step: 156360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:16:33,183-Speed 3310.32 samples/sec Loss 2.9964 LearningRate 0.0137 Epoch: 12 Global Step: 156370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:16:36,272-Speed 3315.99 samples/sec Loss 3.1179 LearningRate 0.0137 Epoch: 12 Global Step: 156380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:16:39,334-Speed 3345.59 samples/sec Loss 3.0861 LearningRate 0.0137 Epoch: 12 Global Step: 156390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:16:42,456-Speed 3280.86 samples/sec Loss 3.1008 LearningRate 0.0137 Epoch: 12 Global Step: 156400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:16:45,544-Speed 3317.66 samples/sec Loss 2.9850 LearningRate 0.0137 Epoch: 12 Global Step: 156410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:16:48,631-Speed 3317.50 samples/sec Loss 3.0914 LearningRate 0.0137 Epoch: 12 Global Step: 156420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:16:51,807-Speed 3225.55 samples/sec Loss 3.0758 LearningRate 0.0137 Epoch: 12 Global Step: 156430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:16:54,929-Speed 3280.45 samples/sec Loss 3.0043 LearningRate 0.0137 Epoch: 12 Global Step: 156440 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:16:58,004-Speed 3331.98 samples/sec Loss 3.0750 LearningRate 0.0137 Epoch: 12 Global Step: 156450 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:17:01,088-Speed 3320.96 samples/sec Loss 3.0232 LearningRate 0.0137 Epoch: 12 Global Step: 156460 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:17:04,196-Speed 3295.01 samples/sec Loss 2.9875 LearningRate 0.0137 Epoch: 12 Global Step: 156470 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:17:07,308-Speed 3291.26 samples/sec Loss 2.9923 LearningRate 0.0137 Epoch: 12 Global Step: 156480 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:17:10,395-Speed 3319.11 samples/sec Loss 2.9541 LearningRate 0.0137 Epoch: 12 Global Step: 156490 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:17:13,540-Speed 3257.03 samples/sec Loss 3.0550 LearningRate 0.0137 Epoch: 12 Global Step: 156500 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:17:16,693-Speed 3248.16 samples/sec Loss 3.0948 LearningRate 0.0137 Epoch: 12 Global Step: 156510 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:17:19,793-Speed 3304.33 samples/sec Loss 3.0161 LearningRate 0.0137 Epoch: 12 Global Step: 156520 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:17:22,890-Speed 3307.79 samples/sec Loss 3.0308 LearningRate 0.0137 Epoch: 12 Global Step: 156530 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:17:25,966-Speed 3330.54 samples/sec Loss 3.0111 LearningRate 0.0137 Epoch: 12 Global Step: 156540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:17:29,053-Speed 3317.48 samples/sec Loss 3.0694 LearningRate 0.0137 Epoch: 12 Global Step: 156550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:17:32,107-Speed 3353.91 samples/sec Loss 3.0321 LearningRate 0.0137 Epoch: 12 Global Step: 156560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:17:35,289-Speed 3219.26 samples/sec Loss 3.0019 LearningRate 0.0137 Epoch: 12 Global Step: 156570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:17:38,362-Speed 3332.93 samples/sec Loss 3.0204 LearningRate 0.0137 Epoch: 12 Global Step: 156580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:17:41,448-Speed 3319.63 samples/sec Loss 2.9600 LearningRate 0.0137 Epoch: 12 Global Step: 156590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:17:44,510-Speed 3345.46 samples/sec Loss 3.0990 LearningRate 0.0137 Epoch: 12 Global Step: 156600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:17:47,570-Speed 3347.85 samples/sec Loss 3.0815 LearningRate 0.0137 Epoch: 12 Global Step: 156610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:17:50,705-Speed 3267.73 samples/sec Loss 3.0321 LearningRate 0.0137 Epoch: 12 Global Step: 156620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:17:53,822-Speed 3286.43 samples/sec Loss 3.1224 LearningRate 0.0137 Epoch: 12 Global Step: 156630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:17:56,879-Speed 3350.10 samples/sec Loss 3.0425 LearningRate 0.0136 Epoch: 12 Global Step: 156640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:17:59,998-Speed 3284.14 samples/sec Loss 3.0271 LearningRate 0.0136 Epoch: 12 Global Step: 156650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:03,108-Speed 3293.85 samples/sec Loss 3.0983 LearningRate 0.0136 Epoch: 12 Global Step: 156660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:06,242-Speed 3268.36 samples/sec Loss 3.1179 LearningRate 0.0136 Epoch: 12 Global Step: 156670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:09,300-Speed 3349.84 samples/sec Loss 3.0086 LearningRate 0.0136 Epoch: 12 Global Step: 156680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:12,376-Speed 3330.74 samples/sec Loss 3.1217 LearningRate 0.0136 Epoch: 12 Global Step: 156690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:15,513-Speed 3264.82 samples/sec Loss 3.0470 LearningRate 0.0136 Epoch: 12 Global Step: 156700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:18,639-Speed 3276.78 samples/sec Loss 3.0169 LearningRate 0.0136 Epoch: 12 Global Step: 156710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:21,714-Speed 3330.66 samples/sec Loss 3.0547 LearningRate 0.0136 Epoch: 12 Global Step: 156720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:24,787-Speed 3333.83 samples/sec Loss 2.9604 LearningRate 0.0136 Epoch: 12 Global Step: 156730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:27,916-Speed 3273.53 samples/sec Loss 3.0396 LearningRate 0.0136 Epoch: 12 Global Step: 156740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:31,041-Speed 3277.78 samples/sec Loss 3.1348 LearningRate 0.0136 Epoch: 12 Global Step: 156750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:34,114-Speed 3333.31 samples/sec Loss 3.0327 LearningRate 0.0136 Epoch: 12 Global Step: 156760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:37,259-Speed 3256.65 samples/sec Loss 3.1734 LearningRate 0.0136 Epoch: 12 Global Step: 156770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:40,390-Speed 3271.63 samples/sec Loss 3.0836 LearningRate 0.0136 Epoch: 12 Global Step: 156780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:43,489-Speed 3305.02 samples/sec Loss 3.0474 LearningRate 0.0136 Epoch: 12 Global Step: 156790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:46,603-Speed 3289.43 samples/sec Loss 3.0080 LearningRate 0.0136 Epoch: 12 Global Step: 156800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:18:49,757-Speed 3248.09 samples/sec Loss 2.9369 LearningRate 0.0136 Epoch: 12 Global Step: 156810 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:18:52,841-Speed 3321.87 samples/sec Loss 3.0290 LearningRate 0.0136 Epoch: 12 Global Step: 156820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:18:55,916-Speed 3330.65 samples/sec Loss 2.9952 LearningRate 0.0136 Epoch: 12 Global Step: 156830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:18:59,026-Speed 3294.17 samples/sec Loss 3.0471 LearningRate 0.0136 Epoch: 12 Global Step: 156840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:02,122-Speed 3309.18 samples/sec Loss 3.0708 LearningRate 0.0136 Epoch: 12 Global Step: 156850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:05,192-Speed 3336.03 samples/sec Loss 3.0637 LearningRate 0.0136 Epoch: 12 Global Step: 156860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:08,266-Speed 3331.66 samples/sec Loss 3.0573 LearningRate 0.0136 Epoch: 12 Global Step: 156870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:11,343-Speed 3329.03 samples/sec Loss 3.0881 LearningRate 0.0136 Epoch: 12 Global Step: 156880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:14,447-Speed 3300.58 samples/sec Loss 3.1183 LearningRate 0.0136 Epoch: 12 Global Step: 156890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:17,551-Speed 3299.53 samples/sec Loss 2.9889 LearningRate 0.0136 Epoch: 12 Global Step: 156900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:20,624-Speed 3333.02 samples/sec Loss 3.0393 LearningRate 0.0136 Epoch: 12 Global Step: 156910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:19:23,736-Speed 3291.99 samples/sec Loss 2.9783 LearningRate 0.0136 Epoch: 12 Global Step: 156920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:19:26,780-Speed 3365.04 samples/sec Loss 3.0399 LearningRate 0.0136 Epoch: 12 Global Step: 156930 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:29,860-Speed 3325.93 samples/sec Loss 3.0230 LearningRate 0.0136 Epoch: 12 Global Step: 156940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:33,017-Speed 3244.79 samples/sec Loss 3.0467 LearningRate 0.0136 Epoch: 12 Global Step: 156950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:36,104-Speed 3318.21 samples/sec Loss 2.9907 LearningRate 0.0136 Epoch: 12 Global Step: 156960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:39,185-Speed 3325.03 samples/sec Loss 3.0765 LearningRate 0.0136 Epoch: 12 Global Step: 156970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:42,277-Speed 3312.88 samples/sec Loss 3.0148 LearningRate 0.0135 Epoch: 12 Global Step: 156980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:45,359-Speed 3323.52 samples/sec Loss 3.0451 LearningRate 0.0135 Epoch: 12 Global Step: 156990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:48,450-Speed 3313.69 samples/sec Loss 3.1173 LearningRate 0.0135 Epoch: 12 Global Step: 157000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:51,525-Speed 3331.12 samples/sec Loss 3.0603 LearningRate 0.0135 Epoch: 12 Global Step: 157010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:54,575-Speed 3358.64 samples/sec Loss 3.0734 LearningRate 0.0135 Epoch: 12 Global Step: 157020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:19:57,630-Speed 3352.49 samples/sec Loss 3.0077 LearningRate 0.0135 Epoch: 12 Global Step: 157030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:20:00,701-Speed 3335.91 samples/sec Loss 3.1230 LearningRate 0.0135 Epoch: 12 Global Step: 157040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:20:03,781-Speed 3326.09 samples/sec Loss 3.0343 LearningRate 0.0135 Epoch: 12 Global Step: 157050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:06,847-Speed 3340.21 samples/sec Loss 3.0236 LearningRate 0.0135 Epoch: 12 Global Step: 157060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:09,894-Speed 3362.37 samples/sec Loss 3.1439 LearningRate 0.0135 Epoch: 12 Global Step: 157070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:12,989-Speed 3309.75 samples/sec Loss 3.0832 LearningRate 0.0135 Epoch: 12 Global Step: 157080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:16,155-Speed 3235.27 samples/sec Loss 3.1089 LearningRate 0.0135 Epoch: 12 Global Step: 157090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:19,200-Speed 3363.53 samples/sec Loss 3.0366 LearningRate 0.0135 Epoch: 12 Global Step: 157100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:22,255-Speed 3353.90 samples/sec Loss 3.1142 LearningRate 0.0135 Epoch: 12 Global Step: 157110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:25,330-Speed 3330.55 samples/sec Loss 3.0786 LearningRate 0.0135 Epoch: 12 Global Step: 157120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:28,428-Speed 3306.84 samples/sec Loss 3.0495 LearningRate 0.0135 Epoch: 12 Global Step: 157130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:31,512-Speed 3321.17 samples/sec Loss 3.0886 LearningRate 0.0135 Epoch: 12 Global Step: 157140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:34,585-Speed 3332.42 samples/sec Loss 3.0399 LearningRate 0.0135 Epoch: 12 Global Step: 157150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:20:37,706-Speed 3281.90 samples/sec Loss 3.0365 LearningRate 0.0135 Epoch: 12 Global Step: 157160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:20:40,814-Speed 3296.30 samples/sec Loss 3.0486 LearningRate 0.0135 Epoch: 12 Global Step: 157170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:20:43,879-Speed 3342.34 samples/sec Loss 2.9994 LearningRate 0.0135 Epoch: 12 Global Step: 157180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:20:46,962-Speed 3322.36 samples/sec Loss 3.0450 LearningRate 0.0135 Epoch: 12 Global Step: 157190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:20:50,038-Speed 3329.28 samples/sec Loss 2.9689 LearningRate 0.0135 Epoch: 12 Global Step: 157200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:20:53,101-Speed 3344.42 samples/sec Loss 3.0918 LearningRate 0.0135 Epoch: 12 Global Step: 157210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:56,157-Speed 3352.15 samples/sec Loss 3.0842 LearningRate 0.0135 Epoch: 12 Global Step: 157220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:20:59,321-Speed 3237.15 samples/sec Loss 3.0684 LearningRate 0.0135 Epoch: 12 Global Step: 157230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:21:02,529-Speed 3193.13 samples/sec Loss 3.1161 LearningRate 0.0135 Epoch: 12 Global Step: 157240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:21:05,724-Speed 3205.67 samples/sec Loss 3.0904 LearningRate 0.0135 Epoch: 12 Global Step: 157250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:21:08,802-Speed 3328.30 samples/sec Loss 3.0434 LearningRate 0.0135 Epoch: 12 Global Step: 157260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:21:11,898-Speed 3308.56 samples/sec Loss 3.0412 LearningRate 0.0135 Epoch: 12 Global Step: 157270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:21:15,038-Speed 3261.98 samples/sec Loss 3.0973 LearningRate 0.0135 Epoch: 12 Global Step: 157280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:21:18,155-Speed 3286.88 samples/sec Loss 3.0271 LearningRate 0.0135 Epoch: 12 Global Step: 157290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:21:21,234-Speed 3326.15 samples/sec Loss 3.1199 LearningRate 0.0135 Epoch: 12 Global Step: 157300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:21:24,296-Speed 3345.52 samples/sec Loss 3.0806 LearningRate 0.0135 Epoch: 12 Global Step: 157310 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:21:27,363-Speed 3339.93 samples/sec Loss 3.0358 LearningRate 0.0134 Epoch: 12 Global Step: 157320 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:21:30,452-Speed 3315.58 samples/sec Loss 3.0670 LearningRate 0.0134 Epoch: 12 Global Step: 157330 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:21:33,551-Speed 3306.03 samples/sec Loss 2.9738 LearningRate 0.0134 Epoch: 12 Global Step: 157340 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:21:36,651-Speed 3303.70 samples/sec Loss 3.0705 LearningRate 0.0134 Epoch: 12 Global Step: 157350 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:21:39,861-Speed 3191.33 samples/sec Loss 3.0922 LearningRate 0.0134 Epoch: 12 Global Step: 157360 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:21:43,017-Speed 3245.74 samples/sec Loss 3.0749 LearningRate 0.0134 Epoch: 12 Global Step: 157370 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:21:46,091-Speed 3331.59 samples/sec Loss 3.0469 LearningRate 0.0134 Epoch: 12 Global Step: 157380 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:21:49,252-Speed 3240.60 samples/sec Loss 3.1306 LearningRate 0.0134 Epoch: 12 Global Step: 157390 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:21:52,356-Speed 3301.08 samples/sec Loss 3.0503 LearningRate 0.0134 Epoch: 12 Global Step: 157400 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:21:55,470-Speed 3288.86 samples/sec Loss 3.0552 LearningRate 0.0134 Epoch: 12 Global Step: 157410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:21:58,533-Speed 3343.83 samples/sec Loss 3.0309 LearningRate 0.0134 Epoch: 12 Global Step: 157420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:22:01,685-Speed 3249.98 samples/sec Loss 3.0977 LearningRate 0.0134 Epoch: 12 Global Step: 157430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:22:04,846-Speed 3240.27 samples/sec Loss 3.0164 LearningRate 0.0134 Epoch: 12 Global Step: 157440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:22:08,015-Speed 3232.07 samples/sec Loss 3.0615 LearningRate 0.0134 Epoch: 12 Global Step: 157450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:22:11,109-Speed 3311.58 samples/sec Loss 3.1751 LearningRate 0.0134 Epoch: 12 Global Step: 157460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:22:14,269-Speed 3242.10 samples/sec Loss 3.0733 LearningRate 0.0134 Epoch: 12 Global Step: 157470 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:22:17,413-Speed 3257.38 samples/sec Loss 3.1098 LearningRate 0.0134 Epoch: 12 Global Step: 157480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:22:20,493-Speed 3325.90 samples/sec Loss 3.0679 LearningRate 0.0134 Epoch: 12 Global Step: 157490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:22:23,590-Speed 3306.91 samples/sec Loss 3.0061 LearningRate 0.0134 Epoch: 12 Global Step: 157500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:22:26,695-Speed 3300.31 samples/sec Loss 3.0358 LearningRate 0.0134 Epoch: 12 Global Step: 157510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:22:29,791-Speed 3307.82 samples/sec Loss 3.0689 LearningRate 0.0134 Epoch: 12 Global Step: 157520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:22:32,903-Speed 3292.24 samples/sec Loss 3.0501 LearningRate 0.0134 Epoch: 12 Global Step: 157530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:22:36,011-Speed 3295.85 samples/sec Loss 3.0514 LearningRate 0.0134 Epoch: 12 Global Step: 157540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:22:39,114-Speed 3301.29 samples/sec Loss 3.0518 LearningRate 0.0134 Epoch: 12 Global Step: 157550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:22:42,255-Speed 3260.38 samples/sec Loss 3.0400 LearningRate 0.0134 Epoch: 12 Global Step: 157560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:22:45,347-Speed 3313.50 samples/sec Loss 3.0773 LearningRate 0.0134 Epoch: 12 Global Step: 157570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:22:48,475-Speed 3274.54 samples/sec Loss 3.0767 LearningRate 0.0134 Epoch: 12 Global Step: 157580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:22:51,600-Speed 3277.34 samples/sec Loss 3.1095 LearningRate 0.0134 Epoch: 12 Global Step: 157590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:22:54,754-Speed 3247.59 samples/sec Loss 3.0195 LearningRate 0.0134 Epoch: 12 Global Step: 157600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:22:57,871-Speed 3286.42 samples/sec Loss 3.0442 LearningRate 0.0134 Epoch: 12 Global Step: 157610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:23:00,971-Speed 3304.26 samples/sec Loss 3.1208 LearningRate 0.0134 Epoch: 12 Global Step: 157620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:23:04,113-Speed 3260.20 samples/sec Loss 3.0133 LearningRate 0.0134 Epoch: 12 Global Step: 157630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:23:07,250-Speed 3264.94 samples/sec Loss 3.0718 LearningRate 0.0134 Epoch: 12 Global Step: 157640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:23:10,366-Speed 3287.84 samples/sec Loss 3.0383 LearningRate 0.0134 Epoch: 12 Global Step: 157650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:23:13,518-Speed 3249.12 samples/sec Loss 3.0617 LearningRate 0.0133 Epoch: 12 Global Step: 157660 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:23:16,611-Speed 3311.53 samples/sec Loss 3.0263 LearningRate 0.0133 Epoch: 12 Global Step: 157670 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:23:19,683-Speed 3334.35 samples/sec Loss 3.0616 LearningRate 0.0133 Epoch: 12 Global Step: 157680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:23:22,781-Speed 3306.85 samples/sec Loss 3.0181 LearningRate 0.0133 Epoch: 12 Global Step: 157690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:23:25,913-Speed 3270.22 samples/sec Loss 3.1066 LearningRate 0.0133 Epoch: 12 Global Step: 157700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:23:29,054-Speed 3261.84 samples/sec Loss 3.0397 LearningRate 0.0133 Epoch: 12 Global Step: 157710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:23:32,164-Speed 3293.41 samples/sec Loss 3.0069 LearningRate 0.0133 Epoch: 12 Global Step: 157720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:23:35,236-Speed 3335.15 samples/sec Loss 3.0902 LearningRate 0.0133 Epoch: 12 Global Step: 157730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:23:38,354-Speed 3284.69 samples/sec Loss 3.0666 LearningRate 0.0133 Epoch: 12 Global Step: 157740 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:23:41,526-Speed 3229.17 samples/sec Loss 3.0925 LearningRate 0.0133 Epoch: 12 Global Step: 157750 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:23:44,633-Speed 3296.51 samples/sec Loss 3.1663 LearningRate 0.0133 Epoch: 12 Global Step: 157760 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:23:47,705-Speed 3334.13 samples/sec Loss 3.0504 LearningRate 0.0133 Epoch: 12 Global Step: 157770 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:23:50,822-Speed 3286.90 samples/sec Loss 3.0698 LearningRate 0.0133 Epoch: 12 Global Step: 157780 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:23:53,975-Speed 3248.50 samples/sec Loss 3.1245 LearningRate 0.0133 Epoch: 12 Global Step: 157790 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:23:57,053-Speed 3328.44 samples/sec Loss 3.0572 LearningRate 0.0133 Epoch: 12 Global Step: 157800 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:24:00,208-Speed 3246.15 samples/sec Loss 2.9724 LearningRate 0.0133 Epoch: 12 Global Step: 157810 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:24:03,313-Speed 3299.91 samples/sec Loss 3.1145 LearningRate 0.0133 Epoch: 12 Global Step: 157820 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:24:06,458-Speed 3256.65 samples/sec Loss 3.0618 LearningRate 0.0133 Epoch: 12 Global Step: 157830 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:24:09,570-Speed 3291.73 samples/sec Loss 3.0453 LearningRate 0.0133 Epoch: 12 Global Step: 157840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:24:12,680-Speed 3293.28 samples/sec Loss 3.0277 LearningRate 0.0133 Epoch: 12 Global Step: 157850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:24:15,790-Speed 3293.74 samples/sec Loss 3.0491 LearningRate 0.0133 Epoch: 12 Global Step: 157860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:24:18,966-Speed 3224.77 samples/sec Loss 3.0285 LearningRate 0.0133 Epoch: 12 Global Step: 157870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:24:22,040-Speed 3333.17 samples/sec Loss 3.1030 LearningRate 0.0133 Epoch: 12 Global Step: 157880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:24:25,193-Speed 3248.67 samples/sec Loss 3.0890 LearningRate 0.0133 Epoch: 12 Global Step: 157890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:24:28,380-Speed 3213.80 samples/sec Loss 3.0635 LearningRate 0.0133 Epoch: 12 Global Step: 157900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:24:31,484-Speed 3299.68 samples/sec Loss 3.1203 LearningRate 0.0133 Epoch: 12 Global Step: 157910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:24:34,545-Speed 3346.68 samples/sec Loss 3.0488 LearningRate 0.0133 Epoch: 12 Global Step: 157920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:24:37,700-Speed 3246.31 samples/sec Loss 3.0281 LearningRate 0.0133 Epoch: 12 Global Step: 157930 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:24:40,864-Speed 3238.07 samples/sec Loss 3.0359 LearningRate 0.0133 Epoch: 12 Global Step: 157940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:24:43,980-Speed 3286.71 samples/sec Loss 3.0657 LearningRate 0.0133 Epoch: 12 Global Step: 157950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:24:47,074-Speed 3310.49 samples/sec Loss 3.0634 LearningRate 0.0133 Epoch: 12 Global Step: 157960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:24:50,268-Speed 3207.19 samples/sec Loss 3.0750 LearningRate 0.0133 Epoch: 12 Global Step: 157970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:24:53,446-Speed 3222.89 samples/sec Loss 3.0049 LearningRate 0.0133 Epoch: 12 Global Step: 157980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:24:56,531-Speed 3320.00 samples/sec Loss 3.0798 LearningRate 0.0133 Epoch: 12 Global Step: 157990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:24:59,620-Speed 3316.26 samples/sec Loss 3.0672 LearningRate 0.0132 Epoch: 12 Global Step: 158000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:02,705-Speed 3320.34 samples/sec Loss 3.1423 LearningRate 0.0132 Epoch: 12 Global Step: 158010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:05,874-Speed 3232.73 samples/sec Loss 3.1028 LearningRate 0.0132 Epoch: 12 Global Step: 158020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:08,960-Speed 3319.28 samples/sec Loss 3.0761 LearningRate 0.0132 Epoch: 12 Global Step: 158030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:12,017-Speed 3349.97 samples/sec Loss 3.0459 LearningRate 0.0132 Epoch: 12 Global Step: 158040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:15,172-Speed 3247.46 samples/sec Loss 3.1288 LearningRate 0.0132 Epoch: 12 Global Step: 158050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:18,297-Speed 3277.81 samples/sec Loss 3.0614 LearningRate 0.0132 Epoch: 12 Global Step: 158060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:21,395-Speed 3306.04 samples/sec Loss 3.0620 LearningRate 0.0132 Epoch: 12 Global Step: 158070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:24,540-Speed 3256.99 samples/sec Loss 3.0471 LearningRate 0.0132 Epoch: 12 Global Step: 158080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:27,644-Speed 3300.12 samples/sec Loss 3.0373 LearningRate 0.0132 Epoch: 12 Global Step: 158090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:30,726-Speed 3324.10 samples/sec Loss 3.0736 LearningRate 0.0132 Epoch: 12 Global Step: 158100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:33,790-Speed 3342.54 samples/sec Loss 3.0983 LearningRate 0.0132 Epoch: 12 Global Step: 158110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:36,873-Speed 3322.54 samples/sec Loss 3.0493 LearningRate 0.0132 Epoch: 12 Global Step: 158120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:25:39,948-Speed 3331.49 samples/sec Loss 3.0719 LearningRate 0.0132 Epoch: 12 Global Step: 158130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:25:43,033-Speed 3320.33 samples/sec Loss 3.0357 LearningRate 0.0132 Epoch: 12 Global Step: 158140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:25:46,093-Speed 3347.43 samples/sec Loss 3.0832 LearningRate 0.0132 Epoch: 12 Global Step: 158150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:25:49,174-Speed 3324.66 samples/sec Loss 3.1353 LearningRate 0.0132 Epoch: 12 Global Step: 158160 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:25:52,291-Speed 3286.11 samples/sec Loss 3.0957 LearningRate 0.0132 Epoch: 12 Global Step: 158170 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:25:55,360-Speed 3337.52 samples/sec Loss 3.0843 LearningRate 0.0132 Epoch: 12 Global Step: 158180 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:25:58,470-Speed 3294.25 samples/sec Loss 3.1924 LearningRate 0.0132 Epoch: 12 Global Step: 158190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:01,523-Speed 3355.48 samples/sec Loss 3.1154 LearningRate 0.0132 Epoch: 12 Global Step: 158200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:04,630-Speed 3296.13 samples/sec Loss 3.0175 LearningRate 0.0132 Epoch: 12 Global Step: 158210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:07,743-Speed 3291.40 samples/sec Loss 3.0880 LearningRate 0.0132 Epoch: 12 Global Step: 158220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:10,820-Speed 3328.15 samples/sec Loss 3.1119 LearningRate 0.0132 Epoch: 12 Global Step: 158230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:26:13,978-Speed 3243.79 samples/sec Loss 3.1649 LearningRate 0.0132 Epoch: 12 Global Step: 158240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:26:17,049-Speed 3335.36 samples/sec Loss 3.0741 LearningRate 0.0132 Epoch: 12 Global Step: 158250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:26:20,082-Speed 3376.96 samples/sec Loss 2.9872 LearningRate 0.0132 Epoch: 12 Global Step: 158260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:23,179-Speed 3307.72 samples/sec Loss 2.9503 LearningRate 0.0132 Epoch: 12 Global Step: 158270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:26,261-Speed 3324.42 samples/sec Loss 3.0628 LearningRate 0.0132 Epoch: 12 Global Step: 158280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:29,333-Speed 3334.26 samples/sec Loss 3.0546 LearningRate 0.0132 Epoch: 12 Global Step: 158290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:32,425-Speed 3312.90 samples/sec Loss 3.0914 LearningRate 0.0132 Epoch: 12 Global Step: 158300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:35,534-Speed 3293.96 samples/sec Loss 3.1202 LearningRate 0.0132 Epoch: 12 Global Step: 158310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:38,611-Speed 3329.47 samples/sec Loss 3.0818 LearningRate 0.0132 Epoch: 12 Global Step: 158320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:41,668-Speed 3350.74 samples/sec Loss 3.0294 LearningRate 0.0132 Epoch: 12 Global Step: 158330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:44,749-Speed 3323.91 samples/sec Loss 2.9938 LearningRate 0.0131 Epoch: 12 Global Step: 158340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:47,817-Speed 3338.97 samples/sec Loss 3.1119 LearningRate 0.0131 Epoch: 12 Global Step: 158350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:26:50,910-Speed 3312.60 samples/sec Loss 3.0914 LearningRate 0.0131 Epoch: 12 Global Step: 158360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:26:54,053-Speed 3258.79 samples/sec Loss 3.0793 LearningRate 0.0131 Epoch: 12 Global Step: 158370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:26:57,116-Speed 3344.41 samples/sec Loss 3.0562 LearningRate 0.0131 Epoch: 12 Global Step: 158380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:27:00,198-Speed 3322.51 samples/sec Loss 3.0275 LearningRate 0.0131 Epoch: 12 Global Step: 158390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:27:03,341-Speed 3259.79 samples/sec Loss 3.0905 LearningRate 0.0131 Epoch: 12 Global Step: 158400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:27:06,378-Speed 3372.67 samples/sec Loss 3.1291 LearningRate 0.0131 Epoch: 12 Global Step: 158410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:09,442-Speed 3342.80 samples/sec Loss 3.0278 LearningRate 0.0131 Epoch: 12 Global Step: 158420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:12,505-Speed 3345.02 samples/sec Loss 3.0257 LearningRate 0.0131 Epoch: 12 Global Step: 158430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:15,631-Speed 3276.65 samples/sec Loss 3.0620 LearningRate 0.0131 Epoch: 12 Global Step: 158440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:18,763-Speed 3270.58 samples/sec Loss 3.0988 LearningRate 0.0131 Epoch: 12 Global Step: 158450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:21,873-Speed 3293.49 samples/sec Loss 3.0932 LearningRate 0.0131 Epoch: 12 Global Step: 158460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:24,978-Speed 3298.34 samples/sec Loss 3.0984 LearningRate 0.0131 Epoch: 12 Global Step: 158470 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:28,087-Speed 3295.23 samples/sec Loss 3.0764 LearningRate 0.0131 Epoch: 12 Global Step: 158480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:31,160-Speed 3332.88 samples/sec Loss 3.0472 LearningRate 0.0131 Epoch: 12 Global Step: 158490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:34,240-Speed 3325.63 samples/sec Loss 3.0754 LearningRate 0.0131 Epoch: 12 Global Step: 158500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:37,408-Speed 3234.20 samples/sec Loss 3.0696 LearningRate 0.0131 Epoch: 12 Global Step: 158510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:27:40,512-Speed 3299.54 samples/sec Loss 3.1751 LearningRate 0.0131 Epoch: 12 Global Step: 158520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:43,582-Speed 3336.60 samples/sec Loss 3.0756 LearningRate 0.0131 Epoch: 12 Global Step: 158530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:46,706-Speed 3279.42 samples/sec Loss 3.0471 LearningRate 0.0131 Epoch: 12 Global Step: 158540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:49,787-Speed 3324.31 samples/sec Loss 3.0841 LearningRate 0.0131 Epoch: 12 Global Step: 158550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:52,901-Speed 3289.55 samples/sec Loss 3.1388 LearningRate 0.0131 Epoch: 12 Global Step: 158560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:55,990-Speed 3316.56 samples/sec Loss 3.0841 LearningRate 0.0131 Epoch: 12 Global Step: 158570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:27:59,056-Speed 3340.45 samples/sec Loss 3.0662 LearningRate 0.0131 Epoch: 12 Global Step: 158580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:02,170-Speed 3290.05 samples/sec Loss 3.1537 LearningRate 0.0131 Epoch: 12 Global Step: 158590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:05,239-Speed 3337.85 samples/sec Loss 3.0511 LearningRate 0.0131 Epoch: 12 Global Step: 158600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:08,387-Speed 3253.90 samples/sec Loss 3.0554 LearningRate 0.0131 Epoch: 12 Global Step: 158610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:11,460-Speed 3333.74 samples/sec Loss 3.1133 LearningRate 0.0131 Epoch: 12 Global Step: 158620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:28:14,562-Speed 3301.50 samples/sec Loss 3.1095 LearningRate 0.0131 Epoch: 12 Global Step: 158630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:28:17,680-Speed 3285.07 samples/sec Loss 3.0286 LearningRate 0.0131 Epoch: 12 Global Step: 158640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:20,744-Speed 3342.75 samples/sec Loss 3.0265 LearningRate 0.0131 Epoch: 12 Global Step: 158650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:23,824-Speed 3326.51 samples/sec Loss 3.0758 LearningRate 0.0131 Epoch: 12 Global Step: 158660 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:26,920-Speed 3308.58 samples/sec Loss 3.1695 LearningRate 0.0131 Epoch: 12 Global Step: 158670 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:30,089-Speed 3231.72 samples/sec Loss 3.1147 LearningRate 0.0130 Epoch: 12 Global Step: 158680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:33,165-Speed 3330.38 samples/sec Loss 3.0534 LearningRate 0.0130 Epoch: 12 Global Step: 158690 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:36,268-Speed 3300.92 samples/sec Loss 3.0559 LearningRate 0.0130 Epoch: 12 Global Step: 158700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:39,391-Speed 3280.00 samples/sec Loss 3.0391 LearningRate 0.0130 Epoch: 12 Global Step: 158710 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:42,478-Speed 3318.15 samples/sec Loss 3.0592 LearningRate 0.0130 Epoch: 12 Global Step: 158720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:45,536-Speed 3349.75 samples/sec Loss 2.9887 LearningRate 0.0130 Epoch: 12 Global Step: 158730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:48,660-Speed 3279.17 samples/sec Loss 3.0380 LearningRate 0.0130 Epoch: 12 Global Step: 158740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:28:51,773-Speed 3289.92 samples/sec Loss 3.0926 LearningRate 0.0130 Epoch: 12 Global Step: 158750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:28:54,844-Speed 3335.14 samples/sec Loss 3.1240 LearningRate 0.0130 Epoch: 12 Global Step: 158760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:28:57,903-Speed 3348.71 samples/sec Loss 3.0965 LearningRate 0.0130 Epoch: 12 Global Step: 158770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:00,982-Speed 3326.48 samples/sec Loss 3.0804 LearningRate 0.0130 Epoch: 12 Global Step: 158780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:04,047-Speed 3342.25 samples/sec Loss 2.9991 LearningRate 0.0130 Epoch: 12 Global Step: 158790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:07,122-Speed 3331.30 samples/sec Loss 3.0460 LearningRate 0.0130 Epoch: 12 Global Step: 158800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:10,193-Speed 3335.74 samples/sec Loss 3.0970 LearningRate 0.0130 Epoch: 12 Global Step: 158810 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:13,365-Speed 3229.19 samples/sec Loss 3.0430 LearningRate 0.0130 Epoch: 12 Global Step: 158820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:16,422-Speed 3350.53 samples/sec Loss 3.1144 LearningRate 0.0130 Epoch: 12 Global Step: 158830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:19,506-Speed 3321.99 samples/sec Loss 3.0369 LearningRate 0.0130 Epoch: 12 Global Step: 158840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:22,620-Speed 3289.03 samples/sec Loss 3.1016 LearningRate 0.0130 Epoch: 12 Global Step: 158850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:25,721-Speed 3303.19 samples/sec Loss 3.0306 LearningRate 0.0130 Epoch: 12 Global Step: 158860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:29:28,800-Speed 3327.36 samples/sec Loss 3.0176 LearningRate 0.0130 Epoch: 12 Global Step: 158870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:29:31,878-Speed 3327.21 samples/sec Loss 3.0849 LearningRate 0.0130 Epoch: 12 Global Step: 158880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:29:34,981-Speed 3301.78 samples/sec Loss 3.0670 LearningRate 0.0130 Epoch: 12 Global Step: 158890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:29:38,038-Speed 3350.00 samples/sec Loss 3.1731 LearningRate 0.0130 Epoch: 12 Global Step: 158900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:29:41,099-Speed 3346.31 samples/sec Loss 2.9496 LearningRate 0.0130 Epoch: 12 Global Step: 158910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:29:44,164-Speed 3343.10 samples/sec Loss 3.1081 LearningRate 0.0130 Epoch: 12 Global Step: 158920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:29:47,247-Speed 3322.28 samples/sec Loss 3.0773 LearningRate 0.0130 Epoch: 12 Global Step: 158930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:29:50,346-Speed 3305.12 samples/sec Loss 3.1128 LearningRate 0.0130 Epoch: 12 Global Step: 158940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:53,441-Speed 3309.71 samples/sec Loss 3.0316 LearningRate 0.0130 Epoch: 12 Global Step: 158950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:56,507-Speed 3341.21 samples/sec Loss 3.1596 LearningRate 0.0130 Epoch: 12 Global Step: 158960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:29:59,595-Speed 3316.18 samples/sec Loss 3.0224 LearningRate 0.0130 Epoch: 12 Global Step: 158970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:30:02,737-Speed 3259.99 samples/sec Loss 3.0825 LearningRate 0.0130 Epoch: 12 Global Step: 158980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:30:05,834-Speed 3307.64 samples/sec Loss 3.0374 LearningRate 0.0130 Epoch: 12 Global Step: 158990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:30:08,898-Speed 3343.69 samples/sec Loss 3.1147 LearningRate 0.0130 Epoch: 12 Global Step: 159000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:30:11,972-Speed 3331.93 samples/sec Loss 3.1034 LearningRate 0.0130 Epoch: 12 Global Step: 159010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:30:15,038-Speed 3341.01 samples/sec Loss 3.0552 LearningRate 0.0130 Epoch: 12 Global Step: 159020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:30:18,135-Speed 3307.55 samples/sec Loss 3.0335 LearningRate 0.0129 Epoch: 12 Global Step: 159030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:30:21,216-Speed 3323.93 samples/sec Loss 3.1429 LearningRate 0.0129 Epoch: 12 Global Step: 159040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:30:24,315-Speed 3305.82 samples/sec Loss 3.1022 LearningRate 0.0129 Epoch: 12 Global Step: 159050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:30:27,369-Speed 3354.67 samples/sec Loss 3.1371 LearningRate 0.0129 Epoch: 12 Global Step: 159060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:30:30,422-Speed 3354.64 samples/sec Loss 3.0705 LearningRate 0.0129 Epoch: 12 Global Step: 159070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:30:33,498-Speed 3330.27 samples/sec Loss 3.1354 LearningRate 0.0129 Epoch: 12 Global Step: 159080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:30:36,570-Speed 3334.06 samples/sec Loss 3.0738 LearningRate 0.0129 Epoch: 12 Global Step: 159090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:30:39,676-Speed 3298.31 samples/sec Loss 3.1703 LearningRate 0.0129 Epoch: 12 Global Step: 159100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:30:42,725-Speed 3359.10 samples/sec Loss 3.0672 LearningRate 0.0129 Epoch: 12 Global Step: 159110 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:30:45,799-Speed 3332.46 samples/sec Loss 3.1511 LearningRate 0.0129 Epoch: 12 Global Step: 159120 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:30:48,910-Speed 3292.92 samples/sec Loss 3.1934 LearningRate 0.0129 Epoch: 12 Global Step: 159130 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:30:52,050-Speed 3261.76 samples/sec Loss 3.0873 LearningRate 0.0129 Epoch: 12 Global Step: 159140 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:30:55,176-Speed 3276.99 samples/sec Loss 3.0326 LearningRate 0.0129 Epoch: 12 Global Step: 159150 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:30:58,336-Speed 3241.47 samples/sec Loss 3.0893 LearningRate 0.0129 Epoch: 12 Global Step: 159160 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:31:01,450-Speed 3290.37 samples/sec Loss 3.0933 LearningRate 0.0129 Epoch: 12 Global Step: 159170 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:31:04,582-Speed 3270.66 samples/sec Loss 3.0400 LearningRate 0.0129 Epoch: 12 Global Step: 159180 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:31:07,658-Speed 3329.47 samples/sec Loss 3.0958 LearningRate 0.0129 Epoch: 12 Global Step: 159190 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:31:10,770-Speed 3291.64 samples/sec Loss 3.0008 LearningRate 0.0129 Epoch: 12 Global Step: 159200 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:31:13,856-Speed 3319.86 samples/sec Loss 3.0294 LearningRate 0.0129 Epoch: 12 Global Step: 159210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:16,966-Speed 3293.73 samples/sec Loss 3.0991 LearningRate 0.0129 Epoch: 12 Global Step: 159220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:20,089-Speed 3279.80 samples/sec Loss 3.0614 LearningRate 0.0129 Epoch: 12 Global Step: 159230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:23,183-Speed 3310.84 samples/sec Loss 3.0318 LearningRate 0.0129 Epoch: 12 Global Step: 159240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:26,275-Speed 3312.09 samples/sec Loss 3.0898 LearningRate 0.0129 Epoch: 12 Global Step: 159250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:29,392-Speed 3286.84 samples/sec Loss 3.1011 LearningRate 0.0129 Epoch: 12 Global Step: 159260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:32,531-Speed 3262.99 samples/sec Loss 3.0861 LearningRate 0.0129 Epoch: 12 Global Step: 159270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:35,614-Speed 3322.60 samples/sec Loss 3.0388 LearningRate 0.0129 Epoch: 12 Global Step: 159280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:38,669-Speed 3353.47 samples/sec Loss 3.0838 LearningRate 0.0129 Epoch: 12 Global Step: 159290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:41,764-Speed 3308.91 samples/sec Loss 3.0936 LearningRate 0.0129 Epoch: 12 Global Step: 159300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:44,890-Speed 3276.56 samples/sec Loss 3.0658 LearningRate 0.0129 Epoch: 12 Global Step: 159310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:31:48,028-Speed 3264.42 samples/sec Loss 3.0170 LearningRate 0.0129 Epoch: 12 Global Step: 159320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:31:51,193-Speed 3236.80 samples/sec Loss 3.1224 LearningRate 0.0129 Epoch: 12 Global Step: 159330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:54,283-Speed 3315.60 samples/sec Loss 3.0886 LearningRate 0.0129 Epoch: 12 Global Step: 159340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:31:57,372-Speed 3315.58 samples/sec Loss 3.1238 LearningRate 0.0129 Epoch: 12 Global Step: 159350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:32:00,458-Speed 3318.75 samples/sec Loss 3.1079 LearningRate 0.0129 Epoch: 12 Global Step: 159360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:32:03,590-Speed 3270.48 samples/sec Loss 3.1814 LearningRate 0.0128 Epoch: 12 Global Step: 159370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:32:06,760-Speed 3231.72 samples/sec Loss 3.0700 LearningRate 0.0128 Epoch: 12 Global Step: 159380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:32:09,850-Speed 3314.57 samples/sec Loss 3.1149 LearningRate 0.0128 Epoch: 12 Global Step: 159390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:32:12,970-Speed 3283.47 samples/sec Loss 3.0846 LearningRate 0.0128 Epoch: 12 Global Step: 159400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:32:16,115-Speed 3256.71 samples/sec Loss 3.0875 LearningRate 0.0128 Epoch: 12 Global Step: 159410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:32:19,283-Speed 3234.18 samples/sec Loss 3.0800 LearningRate 0.0128 Epoch: 12 Global Step: 159420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:32:22,341-Speed 3349.07 samples/sec Loss 3.0719 LearningRate 0.0128 Epoch: 12 Global Step: 159430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:32:25,460-Speed 3284.56 samples/sec Loss 3.0380 LearningRate 0.0128 Epoch: 12 Global Step: 159440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:32:28,555-Speed 3310.03 samples/sec Loss 3.0450 LearningRate 0.0128 Epoch: 12 Global Step: 159450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:32:31,712-Speed 3243.98 samples/sec Loss 3.0523 LearningRate 0.0128 Epoch: 12 Global Step: 159460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:32:34,834-Speed 3281.24 samples/sec Loss 3.0974 LearningRate 0.0128 Epoch: 12 Global Step: 159470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:32:37,899-Speed 3341.69 samples/sec Loss 3.0322 LearningRate 0.0128 Epoch: 12 Global Step: 159480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:32:41,007-Speed 3296.67 samples/sec Loss 3.0956 LearningRate 0.0128 Epoch: 12 Global Step: 159490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:32:44,137-Speed 3272.76 samples/sec Loss 2.9946 LearningRate 0.0128 Epoch: 12 Global Step: 159500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:32:47,220-Speed 3321.70 samples/sec Loss 3.0755 LearningRate 0.0128 Epoch: 12 Global Step: 159510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:32:50,299-Speed 3327.71 samples/sec Loss 3.0423 LearningRate 0.0128 Epoch: 12 Global Step: 159520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:32:53,436-Speed 3264.99 samples/sec Loss 3.0563 LearningRate 0.0128 Epoch: 12 Global Step: 159530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 15:32:56,492-Speed 3352.19 samples/sec Loss 3.0476 LearningRate 0.0128 Epoch: 12 Global Step: 159540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 15:32:59,637-Speed 3256.22 samples/sec Loss 2.9970 LearningRate 0.0128 Epoch: 12 Global Step: 159550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 15:33:02,756-Speed 3284.29 samples/sec Loss 3.1747 LearningRate 0.0128 Epoch: 12 Global Step: 159560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:33:05,804-Speed 3360.64 samples/sec Loss 3.0420 LearningRate 0.0128 Epoch: 12 Global Step: 159570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:33:08,846-Speed 3367.41 samples/sec Loss 3.0729 LearningRate 0.0128 Epoch: 12 Global Step: 159580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:33:11,910-Speed 3342.67 samples/sec Loss 3.0862 LearningRate 0.0128 Epoch: 12 Global Step: 159590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:33:15,019-Speed 3295.05 samples/sec Loss 3.1653 LearningRate 0.0128 Epoch: 12 Global Step: 159600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:33:18,123-Speed 3299.92 samples/sec Loss 3.0497 LearningRate 0.0128 Epoch: 12 Global Step: 159610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:33:21,181-Speed 3349.96 samples/sec Loss 3.1297 LearningRate 0.0128 Epoch: 12 Global Step: 159620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:33:24,299-Speed 3284.84 samples/sec Loss 3.1048 LearningRate 0.0128 Epoch: 12 Global Step: 159630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:33:27,472-Speed 3228.67 samples/sec Loss 3.0446 LearningRate 0.0128 Epoch: 12 Global Step: 159640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:33:30,570-Speed 3306.58 samples/sec Loss 3.0099 LearningRate 0.0128 Epoch: 12 Global Step: 159650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:33:33,639-Speed 3337.23 samples/sec Loss 3.1631 LearningRate 0.0128 Epoch: 12 Global Step: 159660 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:33:36,739-Speed 3304.32 samples/sec Loss 3.0700 LearningRate 0.0128 Epoch: 12 Global Step: 159670 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:33:39,854-Speed 3288.81 samples/sec Loss 3.1651 LearningRate 0.0128 Epoch: 12 Global Step: 159680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:33:43,004-Speed 3251.55 samples/sec Loss 3.0552 LearningRate 0.0128 Epoch: 12 Global Step: 159690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:33:46,068-Speed 3343.29 samples/sec Loss 3.0794 LearningRate 0.0128 Epoch: 12 Global Step: 159700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:33:49,224-Speed 3245.32 samples/sec Loss 3.2013 LearningRate 0.0128 Epoch: 12 Global Step: 159710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:33:52,347-Speed 3279.87 samples/sec Loss 3.0908 LearningRate 0.0127 Epoch: 12 Global Step: 159720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:33:55,511-Speed 3237.58 samples/sec Loss 3.1248 LearningRate 0.0127 Epoch: 12 Global Step: 159730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:33:58,636-Speed 3278.04 samples/sec Loss 3.0698 LearningRate 0.0127 Epoch: 12 Global Step: 159740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:34:01,781-Speed 3256.85 samples/sec Loss 3.1156 LearningRate 0.0127 Epoch: 12 Global Step: 159750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:34:04,886-Speed 3299.06 samples/sec Loss 3.0641 LearningRate 0.0127 Epoch: 12 Global Step: 159760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:34:07,985-Speed 3304.87 samples/sec Loss 3.1298 LearningRate 0.0127 Epoch: 12 Global Step: 159770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:34:11,105-Speed 3283.71 samples/sec Loss 2.9962 LearningRate 0.0127 Epoch: 12 Global Step: 159780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 15:34:14,252-Speed 3254.27 samples/sec Loss 3.1429 LearningRate 0.0127 Epoch: 12 Global Step: 159790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:34:17,435-Speed 3218.86 samples/sec Loss 3.0672 LearningRate 0.0127 Epoch: 12 Global Step: 159800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:34:20,521-Speed 3319.36 samples/sec Loss 3.0489 LearningRate 0.0127 Epoch: 12 Global Step: 159810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:34:23,558-Speed 3372.32 samples/sec Loss 3.1568 LearningRate 0.0127 Epoch: 12 Global Step: 159820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:34:26,628-Speed 3337.58 samples/sec Loss 2.9713 LearningRate 0.0127 Epoch: 12 Global Step: 159830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:34:29,692-Speed 3342.30 samples/sec Loss 3.1319 LearningRate 0.0127 Epoch: 12 Global Step: 159840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:34:32,758-Speed 3341.17 samples/sec Loss 3.0236 LearningRate 0.0127 Epoch: 12 Global Step: 159850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:34:35,914-Speed 3245.53 samples/sec Loss 3.0719 LearningRate 0.0127 Epoch: 12 Global Step: 159860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:34:39,019-Speed 3298.55 samples/sec Loss 3.0367 LearningRate 0.0127 Epoch: 12 Global Step: 159870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:34:42,241-Speed 3179.56 samples/sec Loss 3.0752 LearningRate 0.0127 Epoch: 12 Global Step: 159880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:34:45,294-Speed 3354.48 samples/sec Loss 3.0089 LearningRate 0.0127 Epoch: 12 Global Step: 159890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:34:48,382-Speed 3317.67 samples/sec Loss 3.0953 LearningRate 0.0127 Epoch: 12 Global Step: 159900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:34:51,565-Speed 3217.90 samples/sec Loss 3.0461 LearningRate 0.0127 Epoch: 12 Global Step: 159910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:34:54,672-Speed 3297.07 samples/sec Loss 3.0086 LearningRate 0.0127 Epoch: 12 Global Step: 159920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:34:57,769-Speed 3307.66 samples/sec Loss 3.0565 LearningRate 0.0127 Epoch: 12 Global Step: 159930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:35:00,824-Speed 3353.24 samples/sec Loss 3.1050 LearningRate 0.0127 Epoch: 12 Global Step: 159940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:04,002-Speed 3222.99 samples/sec Loss 3.1036 LearningRate 0.0127 Epoch: 12 Global Step: 159950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:07,183-Speed 3220.52 samples/sec Loss 3.1181 LearningRate 0.0127 Epoch: 12 Global Step: 159960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:10,274-Speed 3313.91 samples/sec Loss 3.2073 LearningRate 0.0127 Epoch: 12 Global Step: 159970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:13,331-Speed 3349.90 samples/sec Loss 3.0584 LearningRate 0.0127 Epoch: 12 Global Step: 159980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:16,526-Speed 3206.66 samples/sec Loss 3.1236 LearningRate 0.0127 Epoch: 12 Global Step: 159990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:19,616-Speed 3314.18 samples/sec Loss 3.0552 LearningRate 0.0127 Epoch: 12 Global Step: 160000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:22,699-Speed 3323.37 samples/sec Loss 3.0458 LearningRate 0.0127 Epoch: 12 Global Step: 160010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:25,911-Speed 3188.88 samples/sec Loss 3.1214 LearningRate 0.0127 Epoch: 12 Global Step: 160020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:29,048-Speed 3264.88 samples/sec Loss 3.0933 LearningRate 0.0127 Epoch: 12 Global Step: 160030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:32,155-Speed 3296.57 samples/sec Loss 3.0103 LearningRate 0.0127 Epoch: 12 Global Step: 160040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:35:35,279-Speed 3279.68 samples/sec Loss 3.0584 LearningRate 0.0127 Epoch: 12 Global Step: 160050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:35:38,372-Speed 3311.68 samples/sec Loss 3.0522 LearningRate 0.0127 Epoch: 12 Global Step: 160060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:35:41,490-Speed 3284.63 samples/sec Loss 3.0623 LearningRate 0.0126 Epoch: 12 Global Step: 160070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:35:44,607-Speed 3286.60 samples/sec Loss 3.0742 LearningRate 0.0126 Epoch: 12 Global Step: 160080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:47,746-Speed 3262.54 samples/sec Loss 3.1185 LearningRate 0.0126 Epoch: 12 Global Step: 160090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:50,873-Speed 3276.48 samples/sec Loss 3.1073 LearningRate 0.0126 Epoch: 12 Global Step: 160100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:54,030-Speed 3243.70 samples/sec Loss 3.1296 LearningRate 0.0126 Epoch: 12 Global Step: 160110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:35:57,950-Speed 2613.23 samples/sec Loss 3.0945 LearningRate 0.0126 Epoch: 12 Global Step: 160120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:36:00,998-Speed 3360.75 samples/sec Loss 3.1317 LearningRate 0.0126 Epoch: 12 Global Step: 160130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:36:04,186-Speed 3212.95 samples/sec Loss 3.1397 LearningRate 0.0126 Epoch: 12 Global Step: 160140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:36:07,271-Speed 3320.34 samples/sec Loss 3.1383 LearningRate 0.0126 Epoch: 12 Global Step: 160150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:36:10,417-Speed 3256.12 samples/sec Loss 3.1087 LearningRate 0.0126 Epoch: 12 Global Step: 160160 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:36:13,569-Speed 3250.32 samples/sec Loss 3.0972 LearningRate 0.0126 Epoch: 12 Global Step: 160170 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:36:16,692-Speed 3279.01 samples/sec Loss 3.0807 LearningRate 0.0126 Epoch: 12 Global Step: 160180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:19,778-Speed 3319.72 samples/sec Loss 3.1069 LearningRate 0.0126 Epoch: 12 Global Step: 160190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:22,859-Speed 3324.67 samples/sec Loss 3.0845 LearningRate 0.0126 Epoch: 12 Global Step: 160200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:25,971-Speed 3291.56 samples/sec Loss 3.1270 LearningRate 0.0126 Epoch: 12 Global Step: 160210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:29,077-Speed 3297.83 samples/sec Loss 3.0827 LearningRate 0.0126 Epoch: 12 Global Step: 160220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:32,177-Speed 3304.00 samples/sec Loss 3.1021 LearningRate 0.0126 Epoch: 12 Global Step: 160230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:35,253-Speed 3331.00 samples/sec Loss 3.0685 LearningRate 0.0126 Epoch: 12 Global Step: 160240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:38,354-Speed 3302.79 samples/sec Loss 3.1909 LearningRate 0.0126 Epoch: 12 Global Step: 160250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:41,508-Speed 3247.46 samples/sec Loss 3.0597 LearningRate 0.0126 Epoch: 12 Global Step: 160260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:44,628-Speed 3283.52 samples/sec Loss 3.0525 LearningRate 0.0126 Epoch: 12 Global Step: 160270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:47,723-Speed 3309.61 samples/sec Loss 3.1375 LearningRate 0.0126 Epoch: 12 Global Step: 160280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:50,824-Speed 3303.21 samples/sec Loss 3.1303 LearningRate 0.0126 Epoch: 12 Global Step: 160290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:53,897-Speed 3332.32 samples/sec Loss 3.0288 LearningRate 0.0126 Epoch: 12 Global Step: 160300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:36:56,975-Speed 3328.18 samples/sec Loss 3.0740 LearningRate 0.0126 Epoch: 12 Global Step: 160310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:37:00,093-Speed 3285.50 samples/sec Loss 3.0254 LearningRate 0.0126 Epoch: 12 Global Step: 160320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:37:03,182-Speed 3316.38 samples/sec Loss 3.0718 LearningRate 0.0126 Epoch: 12 Global Step: 160330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:37:06,282-Speed 3303.70 samples/sec Loss 3.1212 LearningRate 0.0126 Epoch: 12 Global Step: 160340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:37:09,351-Speed 3337.88 samples/sec Loss 3.1695 LearningRate 0.0126 Epoch: 12 Global Step: 160350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:37:12,472-Speed 3281.85 samples/sec Loss 3.1416 LearningRate 0.0126 Epoch: 12 Global Step: 160360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:37:15,596-Speed 3279.58 samples/sec Loss 3.0916 LearningRate 0.0126 Epoch: 12 Global Step: 160370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:37:18,702-Speed 3297.40 samples/sec Loss 3.0547 LearningRate 0.0126 Epoch: 12 Global Step: 160380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:37:21,778-Speed 3330.88 samples/sec Loss 3.0955 LearningRate 0.0126 Epoch: 12 Global Step: 160390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:37:24,911-Speed 3269.53 samples/sec Loss 3.0061 LearningRate 0.0126 Epoch: 12 Global Step: 160400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:37:28,041-Speed 3271.79 samples/sec Loss 3.0548 LearningRate 0.0126 Epoch: 12 Global Step: 160410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:37:31,134-Speed 3311.89 samples/sec Loss 3.0504 LearningRate 0.0125 Epoch: 12 Global Step: 160420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:37:34,261-Speed 3275.64 samples/sec Loss 3.1051 LearningRate 0.0125 Epoch: 12 Global Step: 160430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:37:37,398-Speed 3264.83 samples/sec Loss 3.0205 LearningRate 0.0125 Epoch: 12 Global Step: 160440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:37:40,509-Speed 3292.71 samples/sec Loss 3.1122 LearningRate 0.0125 Epoch: 12 Global Step: 160450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:37:43,666-Speed 3244.61 samples/sec Loss 3.0781 LearningRate 0.0125 Epoch: 12 Global Step: 160460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:37:46,731-Speed 3341.91 samples/sec Loss 2.9825 LearningRate 0.0125 Epoch: 12 Global Step: 160470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:37:49,820-Speed 3315.93 samples/sec Loss 3.1080 LearningRate 0.0125 Epoch: 12 Global Step: 160480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:37:52,908-Speed 3317.58 samples/sec Loss 3.1635 LearningRate 0.0125 Epoch: 12 Global Step: 160490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:37:55,996-Speed 3317.06 samples/sec Loss 3.0909 LearningRate 0.0125 Epoch: 12 Global Step: 160500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:37:59,045-Speed 3359.68 samples/sec Loss 3.0762 LearningRate 0.0125 Epoch: 12 Global Step: 160510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:38:02,188-Speed 3259.09 samples/sec Loss 3.0747 LearningRate 0.0125 Epoch: 12 Global Step: 160520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:38:05,293-Speed 3298.80 samples/sec Loss 3.0297 LearningRate 0.0125 Epoch: 12 Global Step: 160530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:38:08,398-Speed 3300.00 samples/sec Loss 3.0994 LearningRate 0.0125 Epoch: 12 Global Step: 160540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:38:11,443-Speed 3363.46 samples/sec Loss 3.0825 LearningRate 0.0125 Epoch: 12 Global Step: 160550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:14,618-Speed 3226.06 samples/sec Loss 3.1101 LearningRate 0.0125 Epoch: 12 Global Step: 160560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:17,779-Speed 3240.37 samples/sec Loss 3.0330 LearningRate 0.0125 Epoch: 12 Global Step: 160570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:20,877-Speed 3306.81 samples/sec Loss 3.1700 LearningRate 0.0125 Epoch: 12 Global Step: 160580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:24,016-Speed 3262.90 samples/sec Loss 3.0706 LearningRate 0.0125 Epoch: 12 Global Step: 160590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:27,139-Speed 3279.72 samples/sec Loss 3.0651 LearningRate 0.0125 Epoch: 12 Global Step: 160600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:30,239-Speed 3304.61 samples/sec Loss 3.1250 LearningRate 0.0125 Epoch: 12 Global Step: 160610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:33,356-Speed 3286.63 samples/sec Loss 3.1049 LearningRate 0.0125 Epoch: 12 Global Step: 160620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:36,529-Speed 3227.90 samples/sec Loss 3.1482 LearningRate 0.0125 Epoch: 12 Global Step: 160630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:39,667-Speed 3264.50 samples/sec Loss 3.0531 LearningRate 0.0125 Epoch: 12 Global Step: 160640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:42,782-Speed 3288.19 samples/sec Loss 3.0874 LearningRate 0.0125 Epoch: 12 Global Step: 160650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:45,866-Speed 3321.06 samples/sec Loss 3.0190 LearningRate 0.0125 Epoch: 12 Global Step: 160660 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:48,952-Speed 3319.21 samples/sec Loss 3.1340 LearningRate 0.0125 Epoch: 12 Global Step: 160670 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:52,149-Speed 3204.32 samples/sec Loss 3.0827 LearningRate 0.0125 Epoch: 12 Global Step: 160680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:38:55,235-Speed 3319.15 samples/sec Loss 3.0783 LearningRate 0.0125 Epoch: 12 Global Step: 160690 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:38:58,338-Speed 3301.65 samples/sec Loss 3.0789 LearningRate 0.0125 Epoch: 12 Global Step: 160700 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:01,467-Speed 3272.63 samples/sec Loss 3.0998 LearningRate 0.0125 Epoch: 12 Global Step: 160710 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:04,634-Speed 3235.31 samples/sec Loss 2.9887 LearningRate 0.0125 Epoch: 12 Global Step: 160720 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:07,803-Speed 3232.13 samples/sec Loss 3.0415 LearningRate 0.0125 Epoch: 12 Global Step: 160730 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:10,855-Speed 3355.48 samples/sec Loss 3.0837 LearningRate 0.0125 Epoch: 12 Global Step: 160740 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:13,935-Speed 3326.44 samples/sec Loss 3.1265 LearningRate 0.0125 Epoch: 12 Global Step: 160750 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:17,041-Speed 3297.71 samples/sec Loss 3.1030 LearningRate 0.0125 Epoch: 12 Global Step: 160760 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:20,153-Speed 3291.37 samples/sec Loss 3.1317 LearningRate 0.0124 Epoch: 12 Global Step: 160770 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:23,325-Speed 3229.95 samples/sec Loss 3.1366 LearningRate 0.0124 Epoch: 12 Global Step: 160780 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:26,463-Speed 3263.35 samples/sec Loss 3.0548 LearningRate 0.0124 Epoch: 12 Global Step: 160790 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:29,600-Speed 3265.38 samples/sec Loss 3.0861 LearningRate 0.0124 Epoch: 12 Global Step: 160800 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:32,728-Speed 3275.02 samples/sec Loss 3.1266 LearningRate 0.0124 Epoch: 12 Global Step: 160810 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:35,864-Speed 3265.98 samples/sec Loss 3.0540 LearningRate 0.0124 Epoch: 12 Global Step: 160820 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:39,004-Speed 3261.82 samples/sec Loss 3.0864 LearningRate 0.0124 Epoch: 12 Global Step: 160830 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:42,111-Speed 3298.00 samples/sec Loss 3.1325 LearningRate 0.0124 Epoch: 12 Global Step: 160840 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:45,158-Speed 3361.50 samples/sec Loss 3.1648 LearningRate 0.0124 Epoch: 12 Global Step: 160850 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:48,270-Speed 3291.36 samples/sec Loss 3.1432 LearningRate 0.0124 Epoch: 12 Global Step: 160860 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:51,366-Speed 3308.50 samples/sec Loss 3.0467 LearningRate 0.0124 Epoch: 12 Global Step: 160870 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:54,472-Speed 3298.22 samples/sec Loss 3.0363 LearningRate 0.0124 Epoch: 12 Global Step: 160880 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:39:57,596-Speed 3279.15 samples/sec Loss 3.0877 LearningRate 0.0124 Epoch: 12 Global Step: 160890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:00,700-Speed 3299.63 samples/sec Loss 3.1358 LearningRate 0.0124 Epoch: 12 Global Step: 160900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:03,807-Speed 3297.48 samples/sec Loss 3.0893 LearningRate 0.0124 Epoch: 12 Global Step: 160910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:06,934-Speed 3275.39 samples/sec Loss 3.0695 LearningRate 0.0124 Epoch: 12 Global Step: 160920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:10,050-Speed 3287.10 samples/sec Loss 3.1040 LearningRate 0.0124 Epoch: 12 Global Step: 160930 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:13,192-Speed 3260.10 samples/sec Loss 3.1146 LearningRate 0.0124 Epoch: 12 Global Step: 160940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:16,330-Speed 3264.00 samples/sec Loss 3.0738 LearningRate 0.0124 Epoch: 12 Global Step: 160950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:19,432-Speed 3302.82 samples/sec Loss 3.1666 LearningRate 0.0124 Epoch: 12 Global Step: 160960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:22,529-Speed 3307.14 samples/sec Loss 3.0618 LearningRate 0.0124 Epoch: 12 Global Step: 160970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:25,588-Speed 3349.29 samples/sec Loss 3.1043 LearningRate 0.0124 Epoch: 12 Global Step: 160980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:28,665-Speed 3328.52 samples/sec Loss 3.1140 LearningRate 0.0124 Epoch: 12 Global Step: 160990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:40:31,767-Speed 3302.18 samples/sec Loss 3.0053 LearningRate 0.0124 Epoch: 12 Global Step: 161000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:40:34,826-Speed 3348.78 samples/sec Loss 3.0604 LearningRate 0.0124 Epoch: 12 Global Step: 161010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:37,998-Speed 3229.25 samples/sec Loss 3.1041 LearningRate 0.0124 Epoch: 12 Global Step: 161020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:41,066-Speed 3338.96 samples/sec Loss 3.0701 LearningRate 0.0124 Epoch: 12 Global Step: 161030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:44,127-Speed 3345.87 samples/sec Loss 2.9988 LearningRate 0.0124 Epoch: 12 Global Step: 161040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:47,269-Speed 3260.43 samples/sec Loss 3.0991 LearningRate 0.0124 Epoch: 12 Global Step: 161050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:50,365-Speed 3308.34 samples/sec Loss 3.1037 LearningRate 0.0124 Epoch: 12 Global Step: 161060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:53,556-Speed 3209.86 samples/sec Loss 3.1231 LearningRate 0.0124 Epoch: 12 Global Step: 161070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:56,665-Speed 3294.69 samples/sec Loss 3.0184 LearningRate 0.0124 Epoch: 12 Global Step: 161080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:40:59,810-Speed 3257.68 samples/sec Loss 3.1349 LearningRate 0.0124 Epoch: 12 Global Step: 161090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:41:02,944-Speed 3267.38 samples/sec Loss 3.1324 LearningRate 0.0124 Epoch: 12 Global Step: 161100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:41:06,051-Speed 3297.03 samples/sec Loss 3.0826 LearningRate 0.0124 Epoch: 12 Global Step: 161110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:41:09,103-Speed 3356.81 samples/sec Loss 3.0487 LearningRate 0.0123 Epoch: 12 Global Step: 161120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:41:12,217-Speed 3288.65 samples/sec Loss 3.0531 LearningRate 0.0123 Epoch: 12 Global Step: 161130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:41:15,321-Speed 3300.90 samples/sec Loss 3.0867 LearningRate 0.0123 Epoch: 12 Global Step: 161140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:41:18,476-Speed 3246.80 samples/sec Loss 2.9775 LearningRate 0.0123 Epoch: 12 Global Step: 161150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:41:21,541-Speed 3341.79 samples/sec Loss 3.0177 LearningRate 0.0123 Epoch: 12 Global Step: 161160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:41:24,630-Speed 3316.30 samples/sec Loss 3.1186 LearningRate 0.0123 Epoch: 12 Global Step: 161170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:41:27,735-Speed 3298.43 samples/sec Loss 3.0815 LearningRate 0.0123 Epoch: 12 Global Step: 161180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:41:30,835-Speed 3304.78 samples/sec Loss 3.0521 LearningRate 0.0123 Epoch: 12 Global Step: 161190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:41:33,915-Speed 3326.10 samples/sec Loss 3.1246 LearningRate 0.0123 Epoch: 12 Global Step: 161200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:41:37,001-Speed 3318.95 samples/sec Loss 3.0938 LearningRate 0.0123 Epoch: 12 Global Step: 161210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:41:40,122-Speed 3281.92 samples/sec Loss 3.1186 LearningRate 0.0123 Epoch: 12 Global Step: 161220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:41:43,195-Speed 3333.51 samples/sec Loss 3.0584 LearningRate 0.0123 Epoch: 12 Global Step: 161230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:41:46,268-Speed 3333.33 samples/sec Loss 3.1239 LearningRate 0.0123 Epoch: 12 Global Step: 161240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:41:49,366-Speed 3306.21 samples/sec Loss 3.0882 LearningRate 0.0123 Epoch: 12 Global Step: 161250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:41:52,486-Speed 3282.93 samples/sec Loss 3.0648 LearningRate 0.0123 Epoch: 12 Global Step: 161260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:41:55,629-Speed 3259.67 samples/sec Loss 3.0155 LearningRate 0.0123 Epoch: 12 Global Step: 161270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:41:58,703-Speed 3331.70 samples/sec Loss 3.0605 LearningRate 0.0123 Epoch: 12 Global Step: 161280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:42:01,854-Speed 3251.00 samples/sec Loss 2.9784 LearningRate 0.0123 Epoch: 12 Global Step: 161290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:42:04,947-Speed 3311.98 samples/sec Loss 3.0686 LearningRate 0.0123 Epoch: 12 Global Step: 161300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:42:08,027-Speed 3325.46 samples/sec Loss 3.0325 LearningRate 0.0123 Epoch: 12 Global Step: 161310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:42:11,113-Speed 3319.37 samples/sec Loss 3.0463 LearningRate 0.0123 Epoch: 12 Global Step: 161320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:42:14,187-Speed 3332.15 samples/sec Loss 3.1664 LearningRate 0.0123 Epoch: 12 Global Step: 161330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:42:17,276-Speed 3316.54 samples/sec Loss 3.1132 LearningRate 0.0123 Epoch: 12 Global Step: 161340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:42:20,356-Speed 3324.94 samples/sec Loss 3.1027 LearningRate 0.0123 Epoch: 12 Global Step: 161350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:42:23,460-Speed 3300.62 samples/sec Loss 3.0840 LearningRate 0.0123 Epoch: 12 Global Step: 161360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 15:42:26,527-Speed 3339.68 samples/sec Loss 3.1384 LearningRate 0.0123 Epoch: 12 Global Step: 161370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:42:29,609-Speed 3323.76 samples/sec Loss 3.1271 LearningRate 0.0123 Epoch: 12 Global Step: 161380 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:42:32,701-Speed 3312.23 samples/sec Loss 3.0236 LearningRate 0.0123 Epoch: 12 Global Step: 161390 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:42:35,898-Speed 3204.65 samples/sec Loss 3.0473 LearningRate 0.0123 Epoch: 12 Global Step: 161400 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:42:38,990-Speed 3312.27 samples/sec Loss 3.1218 LearningRate 0.0123 Epoch: 12 Global Step: 161410 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:42:42,077-Speed 3317.89 samples/sec Loss 3.1028 LearningRate 0.0123 Epoch: 12 Global Step: 161420 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:42:45,128-Speed 3357.32 samples/sec Loss 3.1567 LearningRate 0.0123 Epoch: 12 Global Step: 161430 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:42:48,283-Speed 3247.15 samples/sec Loss 3.0961 LearningRate 0.0123 Epoch: 12 Global Step: 161440 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:42:51,357-Speed 3331.98 samples/sec Loss 3.0705 LearningRate 0.0123 Epoch: 12 Global Step: 161450 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:42:54,504-Speed 3255.14 samples/sec Loss 3.0432 LearningRate 0.0123 Epoch: 12 Global Step: 161460 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:42:57,782-Speed 3124.87 samples/sec Loss 3.0636 LearningRate 0.0123 Epoch: 12 Global Step: 161470 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:43:30,285-Speed 315.06 samples/sec Loss 2.4420 LearningRate 0.0122 Epoch: 13 Global Step: 161480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:43:33,727-Speed 2975.93 samples/sec Loss 2.2112 LearningRate 0.0122 Epoch: 13 Global Step: 161490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:43:36,885-Speed 3243.34 samples/sec Loss 2.1787 LearningRate 0.0122 Epoch: 13 Global Step: 161500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:43:39,985-Speed 3305.04 samples/sec Loss 2.1425 LearningRate 0.0122 Epoch: 13 Global Step: 161510 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:43:43,144-Speed 3241.55 samples/sec Loss 2.2372 LearningRate 0.0122 Epoch: 13 Global Step: 161520 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 15:43:46,187-Speed 3366.56 samples/sec Loss 2.2098 LearningRate 0.0122 Epoch: 13 Global Step: 161530 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:43:49,306-Speed 3284.37 samples/sec Loss 2.1864 LearningRate 0.0122 Epoch: 13 Global Step: 161540 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:43:52,443-Speed 3264.81 samples/sec Loss 2.2009 LearningRate 0.0122 Epoch: 13 Global Step: 161550 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:43:55,522-Speed 3327.56 samples/sec Loss 2.1562 LearningRate 0.0122 Epoch: 13 Global Step: 161560 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 15:43:58,671-Speed 3252.24 samples/sec Loss 2.1587 LearningRate 0.0122 Epoch: 13 Global Step: 161570 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:44:01,803-Speed 3270.30 samples/sec Loss 2.1420 LearningRate 0.0122 Epoch: 13 Global Step: 161580 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:44:04,882-Speed 3327.06 samples/sec Loss 2.2321 LearningRate 0.0122 Epoch: 13 Global Step: 161590 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:44:07,964-Speed 3323.83 samples/sec Loss 2.2273 LearningRate 0.0122 Epoch: 13 Global Step: 161600 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:44:11,061-Speed 3307.08 samples/sec Loss 2.1623 LearningRate 0.0122 Epoch: 13 Global Step: 161610 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:44:14,175-Speed 3289.81 samples/sec Loss 2.1869 LearningRate 0.0122 Epoch: 13 Global Step: 161620 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:44:17,285-Speed 3294.08 samples/sec Loss 2.1619 LearningRate 0.0122 Epoch: 13 Global Step: 161630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:20,363-Speed 3328.05 samples/sec Loss 2.2244 LearningRate 0.0122 Epoch: 13 Global Step: 161640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:23,508-Speed 3256.00 samples/sec Loss 2.2350 LearningRate 0.0122 Epoch: 13 Global Step: 161650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:26,587-Speed 3327.63 samples/sec Loss 2.2567 LearningRate 0.0122 Epoch: 13 Global Step: 161660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:29,720-Speed 3269.80 samples/sec Loss 2.1562 LearningRate 0.0122 Epoch: 13 Global Step: 161670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:32,800-Speed 3325.20 samples/sec Loss 2.1731 LearningRate 0.0122 Epoch: 13 Global Step: 161680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:35,955-Speed 3246.43 samples/sec Loss 2.2790 LearningRate 0.0122 Epoch: 13 Global Step: 161690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:39,045-Speed 3315.73 samples/sec Loss 2.1478 LearningRate 0.0122 Epoch: 13 Global Step: 161700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:42,204-Speed 3242.57 samples/sec Loss 2.2071 LearningRate 0.0122 Epoch: 13 Global Step: 161710 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:45,293-Speed 3315.55 samples/sec Loss 2.2110 LearningRate 0.0122 Epoch: 13 Global Step: 161720 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:48,382-Speed 3316.49 samples/sec Loss 2.2288 LearningRate 0.0122 Epoch: 13 Global Step: 161730 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:51,446-Speed 3342.46 samples/sec Loss 2.1842 LearningRate 0.0122 Epoch: 13 Global Step: 161740 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:54,555-Speed 3295.05 samples/sec Loss 2.2310 LearningRate 0.0122 Epoch: 13 Global Step: 161750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:44:57,615-Speed 3347.59 samples/sec Loss 2.2283 LearningRate 0.0122 Epoch: 13 Global Step: 161760 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:00,819-Speed 3197.46 samples/sec Loss 2.2098 LearningRate 0.0122 Epoch: 13 Global Step: 161770 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:04,109-Speed 3113.14 samples/sec Loss 2.1716 LearningRate 0.0122 Epoch: 13 Global Step: 161780 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:07,230-Speed 3282.29 samples/sec Loss 2.2018 LearningRate 0.0122 Epoch: 13 Global Step: 161790 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:10,296-Speed 3340.09 samples/sec Loss 2.1824 LearningRate 0.0122 Epoch: 13 Global Step: 161800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:13,394-Speed 3307.12 samples/sec Loss 2.1949 LearningRate 0.0122 Epoch: 13 Global Step: 161810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:16,543-Speed 3252.19 samples/sec Loss 2.2203 LearningRate 0.0122 Epoch: 13 Global Step: 161820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:19,668-Speed 3278.23 samples/sec Loss 2.2603 LearningRate 0.0121 Epoch: 13 Global Step: 161830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:45:22,724-Speed 3351.82 samples/sec Loss 2.2229 LearningRate 0.0121 Epoch: 13 Global Step: 161840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:45:25,803-Speed 3326.68 samples/sec Loss 2.2521 LearningRate 0.0121 Epoch: 13 Global Step: 161850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:28,865-Speed 3345.82 samples/sec Loss 2.2191 LearningRate 0.0121 Epoch: 13 Global Step: 161860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:32,026-Speed 3240.73 samples/sec Loss 2.1890 LearningRate 0.0121 Epoch: 13 Global Step: 161870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:35,144-Speed 3284.46 samples/sec Loss 2.1857 LearningRate 0.0121 Epoch: 13 Global Step: 161880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:38,254-Speed 3293.63 samples/sec Loss 2.2285 LearningRate 0.0121 Epoch: 13 Global Step: 161890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:41,440-Speed 3215.68 samples/sec Loss 2.1954 LearningRate 0.0121 Epoch: 13 Global Step: 161900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:44,524-Speed 3321.49 samples/sec Loss 2.2477 LearningRate 0.0121 Epoch: 13 Global Step: 161910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:47,586-Speed 3344.32 samples/sec Loss 2.2191 LearningRate 0.0121 Epoch: 13 Global Step: 161920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:50,726-Speed 3262.79 samples/sec Loss 2.2145 LearningRate 0.0121 Epoch: 13 Global Step: 161930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:53,828-Speed 3302.67 samples/sec Loss 2.2451 LearningRate 0.0121 Epoch: 13 Global Step: 161940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:45:56,903-Speed 3329.93 samples/sec Loss 2.2575 LearningRate 0.0121 Epoch: 13 Global Step: 161950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:46:00,031-Speed 3275.50 samples/sec Loss 2.1970 LearningRate 0.0121 Epoch: 13 Global Step: 161960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:46:03,133-Speed 3302.00 samples/sec Loss 2.2712 LearningRate 0.0121 Epoch: 13 Global Step: 161970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:46:06,209-Speed 3330.42 samples/sec Loss 2.2727 LearningRate 0.0121 Epoch: 13 Global Step: 161980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:46:09,256-Speed 3361.63 samples/sec Loss 2.2530 LearningRate 0.0121 Epoch: 13 Global Step: 161990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:46:12,398-Speed 3259.51 samples/sec Loss 2.2601 LearningRate 0.0121 Epoch: 13 Global Step: 162000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:46:15,499-Speed 3303.34 samples/sec Loss 2.3024 LearningRate 0.0121 Epoch: 13 Global Step: 162010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:46:18,680-Speed 3219.59 samples/sec Loss 2.2036 LearningRate 0.0121 Epoch: 13 Global Step: 162020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:46:21,755-Speed 3331.00 samples/sec Loss 2.2535 LearningRate 0.0121 Epoch: 13 Global Step: 162030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:46:24,839-Speed 3322.57 samples/sec Loss 2.3049 LearningRate 0.0121 Epoch: 13 Global Step: 162040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:46:27,962-Speed 3279.01 samples/sec Loss 2.2299 LearningRate 0.0121 Epoch: 13 Global Step: 162050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 15:46:31,077-Speed 3288.19 samples/sec Loss 2.1972 LearningRate 0.0121 Epoch: 13 Global Step: 162060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 15:46:34,133-Speed 3352.22 samples/sec Loss 2.1756 LearningRate 0.0121 Epoch: 13 Global Step: 162070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:46:37,270-Speed 3265.85 samples/sec Loss 2.2316 LearningRate 0.0121 Epoch: 13 Global Step: 162080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:46:40,318-Speed 3359.44 samples/sec Loss 2.2149 LearningRate 0.0121 Epoch: 13 Global Step: 162090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:46:43,385-Speed 3339.89 samples/sec Loss 2.1908 LearningRate 0.0121 Epoch: 13 Global Step: 162100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:46:46,471-Speed 3320.05 samples/sec Loss 2.2534 LearningRate 0.0121 Epoch: 13 Global Step: 162110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:46:49,531-Speed 3347.22 samples/sec Loss 2.2700 LearningRate 0.0121 Epoch: 13 Global Step: 162120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:46:52,644-Speed 3290.04 samples/sec Loss 2.2465 LearningRate 0.0121 Epoch: 13 Global Step: 162130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:46:55,703-Speed 3348.77 samples/sec Loss 2.2067 LearningRate 0.0121 Epoch: 13 Global Step: 162140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:46:58,749-Speed 3362.77 samples/sec Loss 2.1860 LearningRate 0.0121 Epoch: 13 Global Step: 162150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:47:01,827-Speed 3328.09 samples/sec Loss 2.3063 LearningRate 0.0121 Epoch: 13 Global Step: 162160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:47:04,877-Speed 3357.73 samples/sec Loss 2.2533 LearningRate 0.0121 Epoch: 13 Global Step: 162170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:07,980-Speed 3301.83 samples/sec Loss 2.3307 LearningRate 0.0121 Epoch: 13 Global Step: 162180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:11,054-Speed 3332.34 samples/sec Loss 2.2128 LearningRate 0.0120 Epoch: 13 Global Step: 162190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:14,153-Speed 3304.43 samples/sec Loss 2.1570 LearningRate 0.0120 Epoch: 13 Global Step: 162200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:17,358-Speed 3196.98 samples/sec Loss 2.2796 LearningRate 0.0120 Epoch: 13 Global Step: 162210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:20,427-Speed 3337.70 samples/sec Loss 2.2261 LearningRate 0.0120 Epoch: 13 Global Step: 162220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:23,494-Speed 3340.21 samples/sec Loss 2.2249 LearningRate 0.0120 Epoch: 13 Global Step: 162230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:26,604-Speed 3292.88 samples/sec Loss 2.2553 LearningRate 0.0120 Epoch: 13 Global Step: 162240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:29,709-Speed 3298.96 samples/sec Loss 2.2794 LearningRate 0.0120 Epoch: 13 Global Step: 162250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:32,825-Speed 3287.42 samples/sec Loss 2.3075 LearningRate 0.0120 Epoch: 13 Global Step: 162260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:35,884-Speed 3349.24 samples/sec Loss 2.2973 LearningRate 0.0120 Epoch: 13 Global Step: 162270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 15:47:39,011-Speed 3275.33 samples/sec Loss 2.2204 LearningRate 0.0120 Epoch: 13 Global Step: 162280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:42,150-Speed 3263.16 samples/sec Loss 2.2082 LearningRate 0.0120 Epoch: 13 Global Step: 162290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:45,241-Speed 3314.41 samples/sec Loss 2.2712 LearningRate 0.0120 Epoch: 13 Global Step: 162300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:48,313-Speed 3334.44 samples/sec Loss 2.2346 LearningRate 0.0120 Epoch: 13 Global Step: 162310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:51,405-Speed 3312.30 samples/sec Loss 2.2378 LearningRate 0.0120 Epoch: 13 Global Step: 162320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:54,475-Speed 3337.14 samples/sec Loss 2.2436 LearningRate 0.0120 Epoch: 13 Global Step: 162330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:47:57,581-Speed 3298.07 samples/sec Loss 2.3089 LearningRate 0.0120 Epoch: 13 Global Step: 162340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:48:00,637-Speed 3351.93 samples/sec Loss 2.2408 LearningRate 0.0120 Epoch: 13 Global Step: 162350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:48:03,726-Speed 3315.78 samples/sec Loss 2.3917 LearningRate 0.0120 Epoch: 13 Global Step: 162360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:48:06,828-Speed 3302.64 samples/sec Loss 2.2541 LearningRate 0.0120 Epoch: 13 Global Step: 162370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:48:09,888-Speed 3346.65 samples/sec Loss 2.2936 LearningRate 0.0120 Epoch: 13 Global Step: 162380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:48:12,959-Speed 3336.02 samples/sec Loss 2.2613 LearningRate 0.0120 Epoch: 13 Global Step: 162390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:48:16,036-Speed 3328.24 samples/sec Loss 2.2752 LearningRate 0.0120 Epoch: 13 Global Step: 162400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:48:19,168-Speed 3271.17 samples/sec Loss 2.3268 LearningRate 0.0120 Epoch: 13 Global Step: 162410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:48:22,281-Speed 3289.94 samples/sec Loss 2.2505 LearningRate 0.0120 Epoch: 13 Global Step: 162420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:48:25,411-Speed 3272.95 samples/sec Loss 2.2742 LearningRate 0.0120 Epoch: 13 Global Step: 162430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:48:28,513-Speed 3301.76 samples/sec Loss 2.2724 LearningRate 0.0120 Epoch: 13 Global Step: 162440 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:48:31,658-Speed 3257.38 samples/sec Loss 2.3033 LearningRate 0.0120 Epoch: 13 Global Step: 162450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:48:34,839-Speed 3220.43 samples/sec Loss 2.2923 LearningRate 0.0120 Epoch: 13 Global Step: 162460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:48:38,001-Speed 3239.75 samples/sec Loss 2.2622 LearningRate 0.0120 Epoch: 13 Global Step: 162470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:48:41,109-Speed 3296.06 samples/sec Loss 2.2953 LearningRate 0.0120 Epoch: 13 Global Step: 162480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:48:44,203-Speed 3309.92 samples/sec Loss 2.2927 LearningRate 0.0120 Epoch: 13 Global Step: 162490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:48:47,375-Speed 3229.55 samples/sec Loss 2.3105 LearningRate 0.0120 Epoch: 13 Global Step: 162500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:48:50,569-Speed 3207.32 samples/sec Loss 2.2943 LearningRate 0.0120 Epoch: 13 Global Step: 162510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:48:53,687-Speed 3284.37 samples/sec Loss 2.3128 LearningRate 0.0120 Epoch: 13 Global Step: 162520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:48:56,763-Speed 3330.67 samples/sec Loss 2.3214 LearningRate 0.0120 Epoch: 13 Global Step: 162530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:48:59,933-Speed 3231.12 samples/sec Loss 2.3008 LearningRate 0.0120 Epoch: 13 Global Step: 162540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:49:03,113-Speed 3221.53 samples/sec Loss 2.3024 LearningRate 0.0119 Epoch: 13 Global Step: 162550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:49:06,325-Speed 3188.78 samples/sec Loss 2.3319 LearningRate 0.0119 Epoch: 13 Global Step: 162560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:49:09,376-Speed 3357.36 samples/sec Loss 2.2908 LearningRate 0.0119 Epoch: 13 Global Step: 162570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:12,482-Speed 3297.71 samples/sec Loss 2.3347 LearningRate 0.0119 Epoch: 13 Global Step: 162580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:15,660-Speed 3223.89 samples/sec Loss 2.3253 LearningRate 0.0119 Epoch: 13 Global Step: 162590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:18,787-Speed 3275.70 samples/sec Loss 2.3485 LearningRate 0.0119 Epoch: 13 Global Step: 162600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:21,874-Speed 3318.13 samples/sec Loss 2.3099 LearningRate 0.0119 Epoch: 13 Global Step: 162610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:25,046-Speed 3229.55 samples/sec Loss 2.3401 LearningRate 0.0119 Epoch: 13 Global Step: 162620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:28,239-Speed 3208.14 samples/sec Loss 2.3034 LearningRate 0.0119 Epoch: 13 Global Step: 162630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:31,329-Speed 3314.53 samples/sec Loss 2.2830 LearningRate 0.0119 Epoch: 13 Global Step: 162640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:34,421-Speed 3312.74 samples/sec Loss 2.3266 LearningRate 0.0119 Epoch: 13 Global Step: 162650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:37,515-Speed 3311.22 samples/sec Loss 2.2962 LearningRate 0.0119 Epoch: 13 Global Step: 162660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:40,634-Speed 3283.81 samples/sec Loss 2.2327 LearningRate 0.0119 Epoch: 13 Global Step: 162670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:49:43,791-Speed 3245.06 samples/sec Loss 2.3333 LearningRate 0.0119 Epoch: 13 Global Step: 162680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:46,855-Speed 3342.08 samples/sec Loss 2.3280 LearningRate 0.0119 Epoch: 13 Global Step: 162690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:49,953-Speed 3306.94 samples/sec Loss 2.2405 LearningRate 0.0119 Epoch: 13 Global Step: 162700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:53,183-Speed 3171.31 samples/sec Loss 2.3453 LearningRate 0.0119 Epoch: 13 Global Step: 162710 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:56,253-Speed 3336.11 samples/sec Loss 2.3023 LearningRate 0.0119 Epoch: 13 Global Step: 162720 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:49:59,303-Speed 3358.50 samples/sec Loss 2.3100 LearningRate 0.0119 Epoch: 13 Global Step: 162730 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:50:02,407-Speed 3300.15 samples/sec Loss 2.2419 LearningRate 0.0119 Epoch: 13 Global Step: 162740 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:50:05,522-Speed 3288.86 samples/sec Loss 2.3528 LearningRate 0.0119 Epoch: 13 Global Step: 162750 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:50:08,607-Speed 3320.22 samples/sec Loss 2.3678 LearningRate 0.0119 Epoch: 13 Global Step: 162760 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:50:11,712-Speed 3299.65 samples/sec Loss 2.4323 LearningRate 0.0119 Epoch: 13 Global Step: 162770 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:50:14,801-Speed 3315.13 samples/sec Loss 2.3152 LearningRate 0.0119 Epoch: 13 Global Step: 162780 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:50:17,965-Speed 3238.40 samples/sec Loss 2.3439 LearningRate 0.0119 Epoch: 13 Global Step: 162790 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:50:21,046-Speed 3324.75 samples/sec Loss 2.2915 LearningRate 0.0119 Epoch: 13 Global Step: 162800 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:50:24,168-Speed 3280.67 samples/sec Loss 2.3197 LearningRate 0.0119 Epoch: 13 Global Step: 162810 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:50:27,304-Speed 3265.51 samples/sec Loss 2.2787 LearningRate 0.0119 Epoch: 13 Global Step: 162820 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:50:30,479-Speed 3226.14 samples/sec Loss 2.3508 LearningRate 0.0119 Epoch: 13 Global Step: 162830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:50:33,571-Speed 3313.73 samples/sec Loss 2.3081 LearningRate 0.0119 Epoch: 13 Global Step: 162840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:50:36,649-Speed 3328.02 samples/sec Loss 2.3780 LearningRate 0.0119 Epoch: 13 Global Step: 162850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:50:39,761-Speed 3291.13 samples/sec Loss 2.3374 LearningRate 0.0119 Epoch: 13 Global Step: 162860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:50:42,919-Speed 3243.68 samples/sec Loss 2.3253 LearningRate 0.0119 Epoch: 13 Global Step: 162870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:50:46,011-Speed 3312.37 samples/sec Loss 2.3813 LearningRate 0.0119 Epoch: 13 Global Step: 162880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:50:49,104-Speed 3311.93 samples/sec Loss 2.2835 LearningRate 0.0119 Epoch: 13 Global Step: 162890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:50:52,268-Speed 3237.07 samples/sec Loss 2.3052 LearningRate 0.0119 Epoch: 13 Global Step: 162900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:50:55,390-Speed 3281.33 samples/sec Loss 2.4063 LearningRate 0.0118 Epoch: 13 Global Step: 162910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:50:58,541-Speed 3250.80 samples/sec Loss 2.4309 LearningRate 0.0118 Epoch: 13 Global Step: 162920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:51:01,735-Speed 3207.21 samples/sec Loss 2.3522 LearningRate 0.0118 Epoch: 13 Global Step: 162930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:51:04,844-Speed 3294.40 samples/sec Loss 2.3545 LearningRate 0.0118 Epoch: 13 Global Step: 162940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:51:07,954-Speed 3293.36 samples/sec Loss 2.3393 LearningRate 0.0118 Epoch: 13 Global Step: 162950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:51:11,127-Speed 3228.92 samples/sec Loss 2.3409 LearningRate 0.0118 Epoch: 13 Global Step: 162960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:51:14,281-Speed 3247.30 samples/sec Loss 2.3610 LearningRate 0.0118 Epoch: 13 Global Step: 162970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:51:17,419-Speed 3264.55 samples/sec Loss 2.4013 LearningRate 0.0118 Epoch: 13 Global Step: 162980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:51:20,511-Speed 3312.02 samples/sec Loss 2.3304 LearningRate 0.0118 Epoch: 13 Global Step: 162990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:51:23,637-Speed 3276.58 samples/sec Loss 2.3506 LearningRate 0.0118 Epoch: 13 Global Step: 163000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:51:26,789-Speed 3249.97 samples/sec Loss 2.3951 LearningRate 0.0118 Epoch: 13 Global Step: 163010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:51:29,860-Speed 3335.28 samples/sec Loss 2.3554 LearningRate 0.0118 Epoch: 13 Global Step: 163020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:51:32,922-Speed 3345.85 samples/sec Loss 2.3779 LearningRate 0.0118 Epoch: 13 Global Step: 163030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:51:36,082-Speed 3242.21 samples/sec Loss 2.3503 LearningRate 0.0118 Epoch: 13 Global Step: 163040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:51:39,228-Speed 3255.84 samples/sec Loss 2.3143 LearningRate 0.0118 Epoch: 13 Global Step: 163050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:51:42,290-Speed 3344.47 samples/sec Loss 2.4011 LearningRate 0.0118 Epoch: 13 Global Step: 163060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:51:45,361-Speed 3335.57 samples/sec Loss 2.3720 LearningRate 0.0118 Epoch: 13 Global Step: 163070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:51:48,410-Speed 3360.13 samples/sec Loss 2.3796 LearningRate 0.0118 Epoch: 13 Global Step: 163080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:51:51,571-Speed 3240.37 samples/sec Loss 2.4026 LearningRate 0.0118 Epoch: 13 Global Step: 163090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:51:54,744-Speed 3228.00 samples/sec Loss 2.3346 LearningRate 0.0118 Epoch: 13 Global Step: 163100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:51:57,825-Speed 3325.22 samples/sec Loss 2.3558 LearningRate 0.0118 Epoch: 13 Global Step: 163110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:52:00,895-Speed 3335.68 samples/sec Loss 2.4100 LearningRate 0.0118 Epoch: 13 Global Step: 163120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:52:03,998-Speed 3301.61 samples/sec Loss 2.3667 LearningRate 0.0118 Epoch: 13 Global Step: 163130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:52:07,145-Speed 3254.64 samples/sec Loss 2.3996 LearningRate 0.0118 Epoch: 13 Global Step: 163140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:52:10,232-Speed 3318.89 samples/sec Loss 2.3973 LearningRate 0.0118 Epoch: 13 Global Step: 163150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:52:13,330-Speed 3305.57 samples/sec Loss 2.2926 LearningRate 0.0118 Epoch: 13 Global Step: 163160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:52:16,409-Speed 3327.66 samples/sec Loss 2.3300 LearningRate 0.0118 Epoch: 13 Global Step: 163170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:52:19,501-Speed 3312.78 samples/sec Loss 2.3857 LearningRate 0.0118 Epoch: 13 Global Step: 163180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:52:22,560-Speed 3347.69 samples/sec Loss 2.4282 LearningRate 0.0118 Epoch: 13 Global Step: 163190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:52:25,689-Speed 3274.63 samples/sec Loss 2.3154 LearningRate 0.0118 Epoch: 13 Global Step: 163200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:52:28,797-Speed 3295.80 samples/sec Loss 2.3264 LearningRate 0.0118 Epoch: 13 Global Step: 163210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:52:31,854-Speed 3350.02 samples/sec Loss 2.3555 LearningRate 0.0118 Epoch: 13 Global Step: 163220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:52:34,919-Speed 3342.00 samples/sec Loss 2.3861 LearningRate 0.0118 Epoch: 13 Global Step: 163230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:52:38,040-Speed 3282.97 samples/sec Loss 2.3782 LearningRate 0.0118 Epoch: 13 Global Step: 163240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:52:41,125-Speed 3320.18 samples/sec Loss 2.3659 LearningRate 0.0118 Epoch: 13 Global Step: 163250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:52:44,198-Speed 3332.77 samples/sec Loss 2.3642 LearningRate 0.0118 Epoch: 13 Global Step: 163260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:52:47,285-Speed 3318.97 samples/sec Loss 2.3580 LearningRate 0.0117 Epoch: 13 Global Step: 163270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:52:50,341-Speed 3351.34 samples/sec Loss 2.3871 LearningRate 0.0117 Epoch: 13 Global Step: 163280 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:52:53,540-Speed 3202.34 samples/sec Loss 2.3240 LearningRate 0.0117 Epoch: 13 Global Step: 163290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:52:56,670-Speed 3272.30 samples/sec Loss 2.3455 LearningRate 0.0117 Epoch: 13 Global Step: 163300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:52:59,786-Speed 3287.07 samples/sec Loss 2.3639 LearningRate 0.0117 Epoch: 13 Global Step: 163310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:53:02,945-Speed 3243.25 samples/sec Loss 2.3081 LearningRate 0.0117 Epoch: 13 Global Step: 163320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:53:06,123-Speed 3222.40 samples/sec Loss 2.3274 LearningRate 0.0117 Epoch: 13 Global Step: 163330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:53:09,220-Speed 3307.53 samples/sec Loss 2.3262 LearningRate 0.0117 Epoch: 13 Global Step: 163340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:53:12,252-Speed 3379.32 samples/sec Loss 2.3972 LearningRate 0.0117 Epoch: 13 Global Step: 163350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:15,358-Speed 3298.08 samples/sec Loss 2.3755 LearningRate 0.0117 Epoch: 13 Global Step: 163360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:18,483-Speed 3277.53 samples/sec Loss 2.3657 LearningRate 0.0117 Epoch: 13 Global Step: 163370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:21,558-Speed 3330.82 samples/sec Loss 2.4556 LearningRate 0.0117 Epoch: 13 Global Step: 163380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:24,685-Speed 3276.91 samples/sec Loss 2.4328 LearningRate 0.0117 Epoch: 13 Global Step: 163390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:27,810-Speed 3278.03 samples/sec Loss 2.3374 LearningRate 0.0117 Epoch: 13 Global Step: 163400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:30,972-Speed 3238.57 samples/sec Loss 2.4008 LearningRate 0.0117 Epoch: 13 Global Step: 163410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:34,093-Speed 3281.84 samples/sec Loss 2.3902 LearningRate 0.0117 Epoch: 13 Global Step: 163420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:37,297-Speed 3197.32 samples/sec Loss 2.3039 LearningRate 0.0117 Epoch: 13 Global Step: 163430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:40,429-Speed 3270.29 samples/sec Loss 2.3510 LearningRate 0.0117 Epoch: 13 Global Step: 163440 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:43,572-Speed 3259.42 samples/sec Loss 2.4309 LearningRate 0.0117 Epoch: 13 Global Step: 163450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:53:46,636-Speed 3343.15 samples/sec Loss 2.3946 LearningRate 0.0117 Epoch: 13 Global Step: 163460 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:49,791-Speed 3246.53 samples/sec Loss 2.4849 LearningRate 0.0117 Epoch: 13 Global Step: 163470 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:52,865-Speed 3332.58 samples/sec Loss 2.4122 LearningRate 0.0117 Epoch: 13 Global Step: 163480 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:55,923-Speed 3349.31 samples/sec Loss 2.4056 LearningRate 0.0117 Epoch: 13 Global Step: 163490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:53:58,999-Speed 3329.77 samples/sec Loss 2.3737 LearningRate 0.0117 Epoch: 13 Global Step: 163500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:02,183-Speed 3217.21 samples/sec Loss 2.3568 LearningRate 0.0117 Epoch: 13 Global Step: 163510 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:05,263-Speed 3326.16 samples/sec Loss 2.3416 LearningRate 0.0117 Epoch: 13 Global Step: 163520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:08,389-Speed 3278.29 samples/sec Loss 2.3893 LearningRate 0.0117 Epoch: 13 Global Step: 163530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:11,468-Speed 3326.80 samples/sec Loss 2.3546 LearningRate 0.0117 Epoch: 13 Global Step: 163540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:14,647-Speed 3222.84 samples/sec Loss 2.3874 LearningRate 0.0117 Epoch: 13 Global Step: 163550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:17,748-Speed 3302.77 samples/sec Loss 2.3918 LearningRate 0.0117 Epoch: 13 Global Step: 163560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:54:20,852-Speed 3300.59 samples/sec Loss 2.3503 LearningRate 0.0117 Epoch: 13 Global Step: 163570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:54:23,917-Speed 3341.58 samples/sec Loss 2.4114 LearningRate 0.0117 Epoch: 13 Global Step: 163580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:27,028-Speed 3292.66 samples/sec Loss 2.3159 LearningRate 0.0117 Epoch: 13 Global Step: 163590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:30,108-Speed 3325.20 samples/sec Loss 2.4299 LearningRate 0.0117 Epoch: 13 Global Step: 163600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:33,180-Speed 3334.40 samples/sec Loss 2.4233 LearningRate 0.0117 Epoch: 13 Global Step: 163610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:36,272-Speed 3313.18 samples/sec Loss 2.4065 LearningRate 0.0117 Epoch: 13 Global Step: 163620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:39,356-Speed 3321.58 samples/sec Loss 2.4022 LearningRate 0.0116 Epoch: 13 Global Step: 163630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:42,468-Speed 3292.13 samples/sec Loss 2.4049 LearningRate 0.0116 Epoch: 13 Global Step: 163640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:45,563-Speed 3308.91 samples/sec Loss 2.3761 LearningRate 0.0116 Epoch: 13 Global Step: 163650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:48,675-Speed 3291.43 samples/sec Loss 2.3810 LearningRate 0.0116 Epoch: 13 Global Step: 163660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:51,826-Speed 3251.40 samples/sec Loss 2.3874 LearningRate 0.0116 Epoch: 13 Global Step: 163670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:54:54,927-Speed 3303.58 samples/sec Loss 2.3713 LearningRate 0.0116 Epoch: 13 Global Step: 163680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:54:58,044-Speed 3286.32 samples/sec Loss 2.3160 LearningRate 0.0116 Epoch: 13 Global Step: 163690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:55:01,176-Speed 3270.25 samples/sec Loss 2.3669 LearningRate 0.0116 Epoch: 13 Global Step: 163700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:55:04,278-Speed 3301.98 samples/sec Loss 2.4631 LearningRate 0.0116 Epoch: 13 Global Step: 163710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:55:07,426-Speed 3254.61 samples/sec Loss 2.3950 LearningRate 0.0116 Epoch: 13 Global Step: 163720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:55:10,532-Speed 3297.88 samples/sec Loss 2.4020 LearningRate 0.0116 Epoch: 13 Global Step: 163730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:55:13,625-Speed 3311.27 samples/sec Loss 2.4131 LearningRate 0.0116 Epoch: 13 Global Step: 163740 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:16,703-Speed 3329.00 samples/sec Loss 2.3534 LearningRate 0.0116 Epoch: 13 Global Step: 163750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:19,830-Speed 3275.47 samples/sec Loss 2.4141 LearningRate 0.0116 Epoch: 13 Global Step: 163760 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:22,921-Speed 3313.77 samples/sec Loss 2.4009 LearningRate 0.0116 Epoch: 13 Global Step: 163770 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:26,213-Speed 3111.99 samples/sec Loss 2.3523 LearningRate 0.0116 Epoch: 13 Global Step: 163780 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:29,396-Speed 3217.04 samples/sec Loss 2.3325 LearningRate 0.0116 Epoch: 13 Global Step: 163790 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:32,456-Speed 3347.88 samples/sec Loss 2.4688 LearningRate 0.0116 Epoch: 13 Global Step: 163800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:35,559-Speed 3300.96 samples/sec Loss 2.4416 LearningRate 0.0116 Epoch: 13 Global Step: 163810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:38,670-Speed 3293.34 samples/sec Loss 2.3788 LearningRate 0.0116 Epoch: 13 Global Step: 163820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:41,764-Speed 3311.10 samples/sec Loss 2.4550 LearningRate 0.0116 Epoch: 13 Global Step: 163830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:44,856-Speed 3312.46 samples/sec Loss 2.4144 LearningRate 0.0116 Epoch: 13 Global Step: 163840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:55:47,917-Speed 3346.38 samples/sec Loss 2.3660 LearningRate 0.0116 Epoch: 13 Global Step: 163850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:51,087-Speed 3230.73 samples/sec Loss 2.4966 LearningRate 0.0116 Epoch: 13 Global Step: 163860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:54,173-Speed 3320.05 samples/sec Loss 2.3775 LearningRate 0.0116 Epoch: 13 Global Step: 163870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:55:57,240-Speed 3340.19 samples/sec Loss 2.3660 LearningRate 0.0116 Epoch: 13 Global Step: 163880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:00,336-Speed 3308.28 samples/sec Loss 2.3552 LearningRate 0.0116 Epoch: 13 Global Step: 163890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:03,431-Speed 3309.23 samples/sec Loss 2.4161 LearningRate 0.0116 Epoch: 13 Global Step: 163900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:06,518-Speed 3319.25 samples/sec Loss 2.2838 LearningRate 0.0116 Epoch: 13 Global Step: 163910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:09,591-Speed 3332.17 samples/sec Loss 2.3959 LearningRate 0.0116 Epoch: 13 Global Step: 163920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:12,743-Speed 3250.61 samples/sec Loss 2.3638 LearningRate 0.0116 Epoch: 13 Global Step: 163930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:15,878-Speed 3267.13 samples/sec Loss 2.4182 LearningRate 0.0116 Epoch: 13 Global Step: 163940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:18,955-Speed 3329.33 samples/sec Loss 2.4164 LearningRate 0.0116 Epoch: 13 Global Step: 163950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:56:22,007-Speed 3356.34 samples/sec Loss 2.4277 LearningRate 0.0116 Epoch: 13 Global Step: 163960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:56:25,141-Speed 3268.38 samples/sec Loss 2.4327 LearningRate 0.0116 Epoch: 13 Global Step: 163970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:56:28,212-Speed 3336.30 samples/sec Loss 2.4100 LearningRate 0.0116 Epoch: 13 Global Step: 163980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:56:31,291-Speed 3325.89 samples/sec Loss 2.4033 LearningRate 0.0116 Epoch: 13 Global Step: 163990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:56:34,350-Speed 3349.12 samples/sec Loss 2.4375 LearningRate 0.0115 Epoch: 13 Global Step: 164000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:37,428-Speed 3327.82 samples/sec Loss 2.4416 LearningRate 0.0115 Epoch: 13 Global Step: 164010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:40,533-Speed 3299.55 samples/sec Loss 2.4141 LearningRate 0.0115 Epoch: 13 Global Step: 164020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:43,590-Speed 3351.10 samples/sec Loss 2.3127 LearningRate 0.0115 Epoch: 13 Global Step: 164030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:46,767-Speed 3223.47 samples/sec Loss 2.3335 LearningRate 0.0115 Epoch: 13 Global Step: 164040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:49,866-Speed 3305.78 samples/sec Loss 2.4126 LearningRate 0.0115 Epoch: 13 Global Step: 164050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:52,984-Speed 3285.10 samples/sec Loss 2.4181 LearningRate 0.0115 Epoch: 13 Global Step: 164060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:56,092-Speed 3296.63 samples/sec Loss 2.4601 LearningRate 0.0115 Epoch: 13 Global Step: 164070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:56:59,174-Speed 3322.63 samples/sec Loss 2.4113 LearningRate 0.0115 Epoch: 13 Global Step: 164080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:02,279-Speed 3299.30 samples/sec Loss 2.4618 LearningRate 0.0115 Epoch: 13 Global Step: 164090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:05,454-Speed 3226.60 samples/sec Loss 2.4127 LearningRate 0.0115 Epoch: 13 Global Step: 164100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:57:08,545-Speed 3313.97 samples/sec Loss 2.4361 LearningRate 0.0115 Epoch: 13 Global Step: 164110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:57:11,625-Speed 3325.60 samples/sec Loss 2.4064 LearningRate 0.0115 Epoch: 13 Global Step: 164120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:57:14,754-Speed 3273.76 samples/sec Loss 2.4808 LearningRate 0.0115 Epoch: 13 Global Step: 164130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:57:17,862-Speed 3296.26 samples/sec Loss 2.4321 LearningRate 0.0115 Epoch: 13 Global Step: 164140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:57:20,894-Speed 3377.53 samples/sec Loss 2.4197 LearningRate 0.0115 Epoch: 13 Global Step: 164150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:23,983-Speed 3316.54 samples/sec Loss 2.4294 LearningRate 0.0115 Epoch: 13 Global Step: 164160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:27,077-Speed 3310.13 samples/sec Loss 2.4535 LearningRate 0.0115 Epoch: 13 Global Step: 164170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:30,198-Speed 3282.50 samples/sec Loss 2.4583 LearningRate 0.0115 Epoch: 13 Global Step: 164180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:33,325-Speed 3275.25 samples/sec Loss 2.4693 LearningRate 0.0115 Epoch: 13 Global Step: 164190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:36,433-Speed 3296.18 samples/sec Loss 2.4832 LearningRate 0.0115 Epoch: 13 Global Step: 164200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:39,567-Speed 3268.53 samples/sec Loss 2.3766 LearningRate 0.0115 Epoch: 13 Global Step: 164210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:42,682-Speed 3288.11 samples/sec Loss 2.4408 LearningRate 0.0115 Epoch: 13 Global Step: 164220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:45,751-Speed 3338.38 samples/sec Loss 2.4887 LearningRate 0.0115 Epoch: 13 Global Step: 164230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:48,827-Speed 3329.06 samples/sec Loss 2.5163 LearningRate 0.0115 Epoch: 13 Global Step: 164240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:57:51,938-Speed 3293.28 samples/sec Loss 2.4183 LearningRate 0.0115 Epoch: 13 Global Step: 164250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:57:55,029-Speed 3313.49 samples/sec Loss 2.3795 LearningRate 0.0115 Epoch: 13 Global Step: 164260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:57:58,097-Speed 3339.21 samples/sec Loss 2.4347 LearningRate 0.0115 Epoch: 13 Global Step: 164270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:58:01,248-Speed 3250.26 samples/sec Loss 2.3761 LearningRate 0.0115 Epoch: 13 Global Step: 164280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:58:04,325-Speed 3329.07 samples/sec Loss 2.4876 LearningRate 0.0115 Epoch: 13 Global Step: 164290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:58:07,432-Speed 3296.83 samples/sec Loss 2.4382 LearningRate 0.0115 Epoch: 13 Global Step: 164300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:58:10,535-Speed 3302.11 samples/sec Loss 2.4455 LearningRate 0.0115 Epoch: 13 Global Step: 164310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:58:13,701-Speed 3235.38 samples/sec Loss 2.4131 LearningRate 0.0115 Epoch: 13 Global Step: 164320 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:58:16,878-Speed 3223.45 samples/sec Loss 2.4575 LearningRate 0.0115 Epoch: 13 Global Step: 164330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:58:19,959-Speed 3325.30 samples/sec Loss 2.4376 LearningRate 0.0115 Epoch: 13 Global Step: 164340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:58:23,044-Speed 3319.82 samples/sec Loss 2.4180 LearningRate 0.0115 Epoch: 13 Global Step: 164350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:58:26,178-Speed 3268.64 samples/sec Loss 2.4156 LearningRate 0.0115 Epoch: 13 Global Step: 164360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:58:29,276-Speed 3306.22 samples/sec Loss 2.4209 LearningRate 0.0114 Epoch: 13 Global Step: 164370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:58:32,353-Speed 3329.38 samples/sec Loss 2.4460 LearningRate 0.0114 Epoch: 13 Global Step: 164380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:58:35,413-Speed 3346.97 samples/sec Loss 2.4185 LearningRate 0.0114 Epoch: 13 Global Step: 164390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:58:38,528-Speed 3287.82 samples/sec Loss 2.4192 LearningRate 0.0114 Epoch: 13 Global Step: 164400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:58:41,736-Speed 3193.62 samples/sec Loss 2.4072 LearningRate 0.0114 Epoch: 13 Global Step: 164410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:58:44,871-Speed 3267.19 samples/sec Loss 2.4433 LearningRate 0.0114 Epoch: 13 Global Step: 164420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:58:47,990-Speed 3283.70 samples/sec Loss 2.4147 LearningRate 0.0114 Epoch: 13 Global Step: 164430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:58:51,091-Speed 3303.65 samples/sec Loss 2.3619 LearningRate 0.0114 Epoch: 13 Global Step: 164440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:58:54,234-Speed 3258.95 samples/sec Loss 2.4686 LearningRate 0.0114 Epoch: 13 Global Step: 164450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:58:57,299-Speed 3342.22 samples/sec Loss 2.4365 LearningRate 0.0114 Epoch: 13 Global Step: 164460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:59:00,385-Speed 3318.98 samples/sec Loss 2.4944 LearningRate 0.0114 Epoch: 13 Global Step: 164470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:59:03,460-Speed 3331.55 samples/sec Loss 2.4745 LearningRate 0.0114 Epoch: 13 Global Step: 164480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 15:59:06,619-Speed 3242.40 samples/sec Loss 2.4806 LearningRate 0.0114 Epoch: 13 Global Step: 164490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:59:09,725-Speed 3296.81 samples/sec Loss 2.4029 LearningRate 0.0114 Epoch: 13 Global Step: 164500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:59:12,830-Speed 3300.16 samples/sec Loss 2.4361 LearningRate 0.0114 Epoch: 13 Global Step: 164510 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:59:15,962-Speed 3269.57 samples/sec Loss 2.4540 LearningRate 0.0114 Epoch: 13 Global Step: 164520 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:59:19,054-Speed 3312.94 samples/sec Loss 2.4826 LearningRate 0.0114 Epoch: 13 Global Step: 164530 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:59:22,145-Speed 3314.40 samples/sec Loss 2.4256 LearningRate 0.0114 Epoch: 13 Global Step: 164540 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:59:25,225-Speed 3325.67 samples/sec Loss 2.3900 LearningRate 0.0114 Epoch: 13 Global Step: 164550 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:59:28,309-Speed 3320.99 samples/sec Loss 2.4624 LearningRate 0.0114 Epoch: 13 Global Step: 164560 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:59:31,432-Speed 3280.50 samples/sec Loss 2.4309 LearningRate 0.0114 Epoch: 13 Global Step: 164570 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:59:34,541-Speed 3294.97 samples/sec Loss 2.4720 LearningRate 0.0114 Epoch: 13 Global Step: 164580 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:59:37,695-Speed 3247.43 samples/sec Loss 2.4497 LearningRate 0.0114 Epoch: 13 Global Step: 164590 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:59:40,840-Speed 3257.18 samples/sec Loss 2.4000 LearningRate 0.0114 Epoch: 13 Global Step: 164600 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 15:59:43,916-Speed 3329.18 samples/sec Loss 2.5173 LearningRate 0.0114 Epoch: 13 Global Step: 164610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:59:47,006-Speed 3315.30 samples/sec Loss 2.4064 LearningRate 0.0114 Epoch: 13 Global Step: 164620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:59:50,075-Speed 3337.31 samples/sec Loss 2.4574 LearningRate 0.0114 Epoch: 13 Global Step: 164630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:59:53,156-Speed 3324.82 samples/sec Loss 2.4269 LearningRate 0.0114 Epoch: 13 Global Step: 164640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:59:56,229-Speed 3333.10 samples/sec Loss 2.4330 LearningRate 0.0114 Epoch: 13 Global Step: 164650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 15:59:59,305-Speed 3329.81 samples/sec Loss 2.4571 LearningRate 0.0114 Epoch: 13 Global Step: 164660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:00:02,476-Speed 3230.74 samples/sec Loss 2.4993 LearningRate 0.0114 Epoch: 13 Global Step: 164670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:00:05,537-Speed 3346.62 samples/sec Loss 2.4240 LearningRate 0.0114 Epoch: 13 Global Step: 164680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:00:08,626-Speed 3315.13 samples/sec Loss 2.5237 LearningRate 0.0114 Epoch: 13 Global Step: 164690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:00:11,794-Speed 3233.85 samples/sec Loss 2.4328 LearningRate 0.0114 Epoch: 13 Global Step: 164700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:00:14,933-Speed 3262.86 samples/sec Loss 2.4491 LearningRate 0.0114 Epoch: 13 Global Step: 164710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:00:18,041-Speed 3296.28 samples/sec Loss 2.4519 LearningRate 0.0114 Epoch: 13 Global Step: 164720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:00:21,101-Speed 3347.09 samples/sec Loss 2.4417 LearningRate 0.0113 Epoch: 13 Global Step: 164730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:00:24,151-Speed 3358.54 samples/sec Loss 2.5046 LearningRate 0.0113 Epoch: 13 Global Step: 164740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:00:27,249-Speed 3306.61 samples/sec Loss 2.4463 LearningRate 0.0113 Epoch: 13 Global Step: 164750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:00:30,347-Speed 3305.75 samples/sec Loss 2.5118 LearningRate 0.0113 Epoch: 13 Global Step: 164760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:00:33,431-Speed 3321.90 samples/sec Loss 2.4410 LearningRate 0.0113 Epoch: 13 Global Step: 164770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:00:36,559-Speed 3274.38 samples/sec Loss 2.4996 LearningRate 0.0113 Epoch: 13 Global Step: 164780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:00:39,703-Speed 3258.30 samples/sec Loss 2.4536 LearningRate 0.0113 Epoch: 13 Global Step: 164790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:00:42,754-Speed 3360.88 samples/sec Loss 2.4423 LearningRate 0.0113 Epoch: 13 Global Step: 164800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:00:45,829-Speed 3331.03 samples/sec Loss 2.4216 LearningRate 0.0113 Epoch: 13 Global Step: 164810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:00:48,981-Speed 3250.20 samples/sec Loss 2.4037 LearningRate 0.0113 Epoch: 13 Global Step: 164820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:00:52,107-Speed 3276.77 samples/sec Loss 2.4899 LearningRate 0.0113 Epoch: 13 Global Step: 164830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:00:55,284-Speed 3224.14 samples/sec Loss 2.5222 LearningRate 0.0113 Epoch: 13 Global Step: 164840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:00:58,421-Speed 3265.42 samples/sec Loss 2.4632 LearningRate 0.0113 Epoch: 13 Global Step: 164850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:01,578-Speed 3244.75 samples/sec Loss 2.4408 LearningRate 0.0113 Epoch: 13 Global Step: 164860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:04,776-Speed 3203.20 samples/sec Loss 2.5258 LearningRate 0.0113 Epoch: 13 Global Step: 164870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:07,889-Speed 3289.70 samples/sec Loss 2.5330 LearningRate 0.0113 Epoch: 13 Global Step: 164880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:10,979-Speed 3314.76 samples/sec Loss 2.4550 LearningRate 0.0113 Epoch: 13 Global Step: 164890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:14,160-Speed 3220.39 samples/sec Loss 2.5204 LearningRate 0.0113 Epoch: 13 Global Step: 164900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:01:17,265-Speed 3299.31 samples/sec Loss 2.4069 LearningRate 0.0113 Epoch: 13 Global Step: 164910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:01:20,316-Speed 3356.95 samples/sec Loss 2.4190 LearningRate 0.0113 Epoch: 13 Global Step: 164920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:01:23,451-Speed 3267.34 samples/sec Loss 2.4874 LearningRate 0.0113 Epoch: 13 Global Step: 164930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:01:26,576-Speed 3277.76 samples/sec Loss 2.4236 LearningRate 0.0113 Epoch: 13 Global Step: 164940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:01:29,732-Speed 3245.70 samples/sec Loss 2.5223 LearningRate 0.0113 Epoch: 13 Global Step: 164950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:32,859-Speed 3276.28 samples/sec Loss 2.4456 LearningRate 0.0113 Epoch: 13 Global Step: 164960 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:35,971-Speed 3290.85 samples/sec Loss 2.4459 LearningRate 0.0113 Epoch: 13 Global Step: 164970 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:39,102-Speed 3272.11 samples/sec Loss 2.4542 LearningRate 0.0113 Epoch: 13 Global Step: 164980 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:42,272-Speed 3231.14 samples/sec Loss 2.4870 LearningRate 0.0113 Epoch: 13 Global Step: 164990 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:45,330-Speed 3349.74 samples/sec Loss 2.4702 LearningRate 0.0113 Epoch: 13 Global Step: 165000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:48,448-Speed 3285.30 samples/sec Loss 2.4550 LearningRate 0.0113 Epoch: 13 Global Step: 165010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:51,625-Speed 3223.75 samples/sec Loss 2.5083 LearningRate 0.0113 Epoch: 13 Global Step: 165020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:54,756-Speed 3271.82 samples/sec Loss 2.4532 LearningRate 0.0113 Epoch: 13 Global Step: 165030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:01:57,843-Speed 3317.79 samples/sec Loss 2.3983 LearningRate 0.0113 Epoch: 13 Global Step: 165040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:02:00,957-Speed 3289.82 samples/sec Loss 2.4856 LearningRate 0.0113 Epoch: 13 Global Step: 165050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:02:04,065-Speed 3295.97 samples/sec Loss 2.4053 LearningRate 0.0113 Epoch: 13 Global Step: 165060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:02:07,213-Speed 3253.47 samples/sec Loss 2.5028 LearningRate 0.0113 Epoch: 13 Global Step: 165070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:02:10,286-Speed 3333.83 samples/sec Loss 2.4839 LearningRate 0.0113 Epoch: 13 Global Step: 165080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:02:13,460-Speed 3226.71 samples/sec Loss 2.4573 LearningRate 0.0113 Epoch: 13 Global Step: 165090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:02:16,594-Speed 3269.38 samples/sec Loss 2.5470 LearningRate 0.0112 Epoch: 13 Global Step: 165100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:02:19,697-Speed 3301.09 samples/sec Loss 2.5070 LearningRate 0.0112 Epoch: 13 Global Step: 165110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:02:22,798-Speed 3303.67 samples/sec Loss 2.5042 LearningRate 0.0112 Epoch: 13 Global Step: 165120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:02:25,952-Speed 3247.04 samples/sec Loss 2.5700 LearningRate 0.0112 Epoch: 13 Global Step: 165130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:02:29,061-Speed 3294.85 samples/sec Loss 2.4610 LearningRate 0.0112 Epoch: 13 Global Step: 165140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:02:32,176-Speed 3288.32 samples/sec Loss 2.4815 LearningRate 0.0112 Epoch: 13 Global Step: 165150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:02:35,223-Speed 3361.61 samples/sec Loss 2.4760 LearningRate 0.0112 Epoch: 13 Global Step: 165160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:02:38,300-Speed 3329.31 samples/sec Loss 2.4801 LearningRate 0.0112 Epoch: 13 Global Step: 165170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:02:41,358-Speed 3349.28 samples/sec Loss 2.4965 LearningRate 0.0112 Epoch: 13 Global Step: 165180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:02:44,446-Speed 3317.16 samples/sec Loss 2.4580 LearningRate 0.0112 Epoch: 13 Global Step: 165190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:02:47,530-Speed 3320.84 samples/sec Loss 2.5041 LearningRate 0.0112 Epoch: 13 Global Step: 165200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:02:50,661-Speed 3271.67 samples/sec Loss 2.4687 LearningRate 0.0112 Epoch: 13 Global Step: 165210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:02:53,754-Speed 3311.85 samples/sec Loss 2.4768 LearningRate 0.0112 Epoch: 13 Global Step: 165220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:02:56,823-Speed 3338.13 samples/sec Loss 2.5481 LearningRate 0.0112 Epoch: 13 Global Step: 165230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:02:59,909-Speed 3319.37 samples/sec Loss 2.4442 LearningRate 0.0112 Epoch: 13 Global Step: 165240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:03:03,000-Speed 3314.11 samples/sec Loss 2.4789 LearningRate 0.0112 Epoch: 13 Global Step: 165250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:03:06,136-Speed 3265.96 samples/sec Loss 2.4834 LearningRate 0.0112 Epoch: 13 Global Step: 165260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:03:09,196-Speed 3347.20 samples/sec Loss 2.5521 LearningRate 0.0112 Epoch: 13 Global Step: 165270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:03:12,341-Speed 3257.02 samples/sec Loss 2.4917 LearningRate 0.0112 Epoch: 13 Global Step: 165280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 16:03:15,447-Speed 3297.61 samples/sec Loss 2.4861 LearningRate 0.0112 Epoch: 13 Global Step: 165290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:03:18,527-Speed 3326.53 samples/sec Loss 2.4458 LearningRate 0.0112 Epoch: 13 Global Step: 165300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:03:21,585-Speed 3349.10 samples/sec Loss 2.4755 LearningRate 0.0112 Epoch: 13 Global Step: 165310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:03:24,644-Speed 3348.36 samples/sec Loss 2.4849 LearningRate 0.0112 Epoch: 13 Global Step: 165320 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:03:27,747-Speed 3301.23 samples/sec Loss 2.4630 LearningRate 0.0112 Epoch: 13 Global Step: 165330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:03:30,810-Speed 3344.50 samples/sec Loss 2.5012 LearningRate 0.0112 Epoch: 13 Global Step: 165340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:03:33,863-Speed 3355.19 samples/sec Loss 2.4785 LearningRate 0.0112 Epoch: 13 Global Step: 165350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:03:37,054-Speed 3209.87 samples/sec Loss 2.5897 LearningRate 0.0112 Epoch: 13 Global Step: 165360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:03:40,246-Speed 3208.78 samples/sec Loss 2.5037 LearningRate 0.0112 Epoch: 13 Global Step: 165370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:03:43,315-Speed 3337.70 samples/sec Loss 2.4552 LearningRate 0.0112 Epoch: 13 Global Step: 165380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:03:46,404-Speed 3315.83 samples/sec Loss 2.5289 LearningRate 0.0112 Epoch: 13 Global Step: 165390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:03:49,554-Speed 3251.60 samples/sec Loss 2.4946 LearningRate 0.0112 Epoch: 13 Global Step: 165400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:03:52,685-Speed 3272.53 samples/sec Loss 2.3998 LearningRate 0.0112 Epoch: 13 Global Step: 165410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:03:55,844-Speed 3242.20 samples/sec Loss 2.4721 LearningRate 0.0112 Epoch: 13 Global Step: 165420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:03:58,910-Speed 3341.26 samples/sec Loss 2.5173 LearningRate 0.0112 Epoch: 13 Global Step: 165430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:04:02,050-Speed 3262.28 samples/sec Loss 2.5348 LearningRate 0.0112 Epoch: 13 Global Step: 165440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:04:05,190-Speed 3262.45 samples/sec Loss 2.4755 LearningRate 0.0112 Epoch: 13 Global Step: 165450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:04:08,292-Speed 3301.63 samples/sec Loss 2.4931 LearningRate 0.0112 Epoch: 13 Global Step: 165460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:04:11,360-Speed 3339.16 samples/sec Loss 2.4840 LearningRate 0.0111 Epoch: 13 Global Step: 165470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:04:14,522-Speed 3239.99 samples/sec Loss 2.5206 LearningRate 0.0111 Epoch: 13 Global Step: 165480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:04:17,610-Speed 3316.19 samples/sec Loss 2.5093 LearningRate 0.0111 Epoch: 13 Global Step: 165490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:04:20,701-Speed 3314.89 samples/sec Loss 2.5787 LearningRate 0.0111 Epoch: 13 Global Step: 165500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:04:23,813-Speed 3291.46 samples/sec Loss 2.5343 LearningRate 0.0111 Epoch: 13 Global Step: 165510 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:04:26,926-Speed 3290.60 samples/sec Loss 2.5601 LearningRate 0.0111 Epoch: 13 Global Step: 165520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:04:30,117-Speed 3209.64 samples/sec Loss 2.5235 LearningRate 0.0111 Epoch: 13 Global Step: 165530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:04:33,206-Speed 3316.62 samples/sec Loss 2.4690 LearningRate 0.0111 Epoch: 13 Global Step: 165540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:04:36,332-Speed 3276.21 samples/sec Loss 2.5120 LearningRate 0.0111 Epoch: 13 Global Step: 165550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:04:39,502-Speed 3232.28 samples/sec Loss 2.5174 LearningRate 0.0111 Epoch: 13 Global Step: 165560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:04:42,601-Speed 3305.10 samples/sec Loss 2.5371 LearningRate 0.0111 Epoch: 13 Global Step: 165570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:04:45,661-Speed 3347.62 samples/sec Loss 2.4526 LearningRate 0.0111 Epoch: 13 Global Step: 165580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:04:48,789-Speed 3273.95 samples/sec Loss 2.4741 LearningRate 0.0111 Epoch: 13 Global Step: 165590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:04:51,876-Speed 3318.39 samples/sec Loss 2.5211 LearningRate 0.0111 Epoch: 13 Global Step: 165600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:04:55,007-Speed 3271.58 samples/sec Loss 2.4772 LearningRate 0.0111 Epoch: 13 Global Step: 165610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:04:58,083-Speed 3330.11 samples/sec Loss 2.4879 LearningRate 0.0111 Epoch: 13 Global Step: 165620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:05:01,145-Speed 3345.72 samples/sec Loss 2.5529 LearningRate 0.0111 Epoch: 13 Global Step: 165630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:05:04,238-Speed 3311.70 samples/sec Loss 2.5150 LearningRate 0.0111 Epoch: 13 Global Step: 165640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:05:07,360-Speed 3280.73 samples/sec Loss 2.5631 LearningRate 0.0111 Epoch: 13 Global Step: 165650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:05:10,467-Speed 3296.76 samples/sec Loss 2.4926 LearningRate 0.0111 Epoch: 13 Global Step: 165660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:05:13,600-Speed 3270.35 samples/sec Loss 2.4747 LearningRate 0.0111 Epoch: 13 Global Step: 165670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:05:16,674-Speed 3331.84 samples/sec Loss 2.5005 LearningRate 0.0111 Epoch: 13 Global Step: 165680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:05:19,725-Speed 3356.86 samples/sec Loss 2.4770 LearningRate 0.0111 Epoch: 13 Global Step: 165690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:05:22,802-Speed 3329.68 samples/sec Loss 2.5140 LearningRate 0.0111 Epoch: 13 Global Step: 165700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:05:26,027-Speed 3176.24 samples/sec Loss 2.5045 LearningRate 0.0111 Epoch: 13 Global Step: 165710 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:05:29,167-Speed 3261.73 samples/sec Loss 2.4653 LearningRate 0.0111 Epoch: 13 Global Step: 165720 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:05:32,272-Speed 3299.33 samples/sec Loss 2.5508 LearningRate 0.0111 Epoch: 13 Global Step: 165730 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:05:35,366-Speed 3310.99 samples/sec Loss 2.5585 LearningRate 0.0111 Epoch: 13 Global Step: 165740 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:05:38,501-Speed 3266.54 samples/sec Loss 2.5255 LearningRate 0.0111 Epoch: 13 Global Step: 165750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:05:41,596-Speed 3310.21 samples/sec Loss 2.5708 LearningRate 0.0111 Epoch: 13 Global Step: 165760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:05:44,689-Speed 3311.19 samples/sec Loss 2.4853 LearningRate 0.0111 Epoch: 13 Global Step: 165770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:05:47,814-Speed 3278.14 samples/sec Loss 2.5367 LearningRate 0.0111 Epoch: 13 Global Step: 165780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:05:51,018-Speed 3196.65 samples/sec Loss 2.5237 LearningRate 0.0111 Epoch: 13 Global Step: 165790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:05:54,085-Speed 3339.73 samples/sec Loss 2.4960 LearningRate 0.0111 Epoch: 13 Global Step: 165800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:05:57,126-Speed 3368.97 samples/sec Loss 2.5380 LearningRate 0.0111 Epoch: 13 Global Step: 165810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:00,314-Speed 3213.18 samples/sec Loss 2.4569 LearningRate 0.0111 Epoch: 13 Global Step: 165820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:03,499-Speed 3215.67 samples/sec Loss 2.6161 LearningRate 0.0111 Epoch: 13 Global Step: 165830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:06,558-Speed 3348.92 samples/sec Loss 2.5080 LearningRate 0.0111 Epoch: 13 Global Step: 165840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:09,641-Speed 3323.25 samples/sec Loss 2.4816 LearningRate 0.0110 Epoch: 13 Global Step: 165850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:12,785-Speed 3257.99 samples/sec Loss 2.4708 LearningRate 0.0110 Epoch: 13 Global Step: 165860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:15,884-Speed 3304.40 samples/sec Loss 2.6045 LearningRate 0.0110 Epoch: 13 Global Step: 165870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:19,010-Speed 3277.32 samples/sec Loss 2.4656 LearningRate 0.0110 Epoch: 13 Global Step: 165880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:22,108-Speed 3306.57 samples/sec Loss 2.4896 LearningRate 0.0110 Epoch: 13 Global Step: 165890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:25,200-Speed 3312.95 samples/sec Loss 2.5293 LearningRate 0.0110 Epoch: 13 Global Step: 165900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:28,288-Speed 3317.03 samples/sec Loss 2.5162 LearningRate 0.0110 Epoch: 13 Global Step: 165910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:06:31,496-Speed 3193.13 samples/sec Loss 2.5043 LearningRate 0.0110 Epoch: 13 Global Step: 165920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:06:34,569-Speed 3333.26 samples/sec Loss 2.4967 LearningRate 0.0110 Epoch: 13 Global Step: 165930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:06:37,726-Speed 3243.47 samples/sec Loss 2.5297 LearningRate 0.0110 Epoch: 13 Global Step: 165940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:06:40,859-Speed 3269.90 samples/sec Loss 2.4911 LearningRate 0.0110 Epoch: 13 Global Step: 165950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:06:43,960-Speed 3302.80 samples/sec Loss 2.5185 LearningRate 0.0110 Epoch: 13 Global Step: 165960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:06:46,998-Speed 3372.29 samples/sec Loss 2.4729 LearningRate 0.0110 Epoch: 13 Global Step: 165970 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:50,083-Speed 3320.23 samples/sec Loss 2.5671 LearningRate 0.0110 Epoch: 13 Global Step: 165980 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:53,168-Speed 3319.85 samples/sec Loss 2.5611 LearningRate 0.0110 Epoch: 13 Global Step: 165990 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:56,285-Speed 3287.06 samples/sec Loss 2.5201 LearningRate 0.0110 Epoch: 13 Global Step: 166000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:06:59,397-Speed 3290.71 samples/sec Loss 2.5264 LearningRate 0.0110 Epoch: 13 Global Step: 166010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:02,518-Speed 3281.89 samples/sec Loss 2.5051 LearningRate 0.0110 Epoch: 13 Global Step: 166020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:05,608-Speed 3315.01 samples/sec Loss 2.5606 LearningRate 0.0110 Epoch: 13 Global Step: 166030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:08,705-Speed 3307.95 samples/sec Loss 2.5340 LearningRate 0.0110 Epoch: 13 Global Step: 166040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:11,827-Speed 3280.61 samples/sec Loss 2.5137 LearningRate 0.0110 Epoch: 13 Global Step: 166050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:14,988-Speed 3240.39 samples/sec Loss 2.4376 LearningRate 0.0110 Epoch: 13 Global Step: 166060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:18,065-Speed 3328.87 samples/sec Loss 2.6261 LearningRate 0.0110 Epoch: 13 Global Step: 166070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:07:21,150-Speed 3319.85 samples/sec Loss 2.5197 LearningRate 0.0110 Epoch: 13 Global Step: 166080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:07:24,258-Speed 3295.51 samples/sec Loss 2.5766 LearningRate 0.0110 Epoch: 13 Global Step: 166090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:07:27,415-Speed 3245.38 samples/sec Loss 2.5346 LearningRate 0.0110 Epoch: 13 Global Step: 166100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:07:30,551-Speed 3266.40 samples/sec Loss 2.5223 LearningRate 0.0110 Epoch: 13 Global Step: 166110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:33,703-Speed 3248.78 samples/sec Loss 2.4968 LearningRate 0.0110 Epoch: 13 Global Step: 166120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:36,890-Speed 3214.73 samples/sec Loss 2.5355 LearningRate 0.0110 Epoch: 13 Global Step: 166130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:40,032-Speed 3259.64 samples/sec Loss 2.6010 LearningRate 0.0110 Epoch: 13 Global Step: 166140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:43,178-Speed 3256.45 samples/sec Loss 2.5393 LearningRate 0.0110 Epoch: 13 Global Step: 166150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:46,285-Speed 3296.68 samples/sec Loss 2.5171 LearningRate 0.0110 Epoch: 13 Global Step: 166160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:49,369-Speed 3321.34 samples/sec Loss 2.5398 LearningRate 0.0110 Epoch: 13 Global Step: 166170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:52,464-Speed 3310.16 samples/sec Loss 2.4324 LearningRate 0.0110 Epoch: 13 Global Step: 166180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:55,570-Speed 3297.32 samples/sec Loss 2.5396 LearningRate 0.0110 Epoch: 13 Global Step: 166190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:07:58,697-Speed 3276.29 samples/sec Loss 2.4816 LearningRate 0.0110 Epoch: 13 Global Step: 166200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:01,810-Speed 3290.39 samples/sec Loss 2.5284 LearningRate 0.0110 Epoch: 13 Global Step: 166210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:08:04,946-Speed 3265.72 samples/sec Loss 2.5179 LearningRate 0.0109 Epoch: 13 Global Step: 166220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:08,106-Speed 3242.07 samples/sec Loss 2.5128 LearningRate 0.0109 Epoch: 13 Global Step: 166230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:11,247-Speed 3260.92 samples/sec Loss 2.4885 LearningRate 0.0109 Epoch: 13 Global Step: 166240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:14,347-Speed 3303.60 samples/sec Loss 2.4699 LearningRate 0.0109 Epoch: 13 Global Step: 166250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:17,508-Speed 3241.37 samples/sec Loss 2.5372 LearningRate 0.0109 Epoch: 13 Global Step: 166260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:20,608-Speed 3304.25 samples/sec Loss 2.5666 LearningRate 0.0109 Epoch: 13 Global Step: 166270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:23,699-Speed 3313.46 samples/sec Loss 2.4651 LearningRate 0.0109 Epoch: 13 Global Step: 166280 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:26,801-Speed 3302.35 samples/sec Loss 2.5053 LearningRate 0.0109 Epoch: 13 Global Step: 166290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:29,921-Speed 3283.42 samples/sec Loss 2.5285 LearningRate 0.0109 Epoch: 13 Global Step: 166300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:33,011-Speed 3315.18 samples/sec Loss 2.5052 LearningRate 0.0109 Epoch: 13 Global Step: 166310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:36,092-Speed 3323.85 samples/sec Loss 2.4781 LearningRate 0.0109 Epoch: 13 Global Step: 166320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:08:39,192-Speed 3304.93 samples/sec Loss 2.4471 LearningRate 0.0109 Epoch: 13 Global Step: 166330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:08:42,351-Speed 3242.26 samples/sec Loss 2.6046 LearningRate 0.0109 Epoch: 13 Global Step: 166340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:08:45,441-Speed 3315.47 samples/sec Loss 2.4692 LearningRate 0.0109 Epoch: 13 Global Step: 166350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:08:48,609-Speed 3233.23 samples/sec Loss 2.5573 LearningRate 0.0109 Epoch: 13 Global Step: 166360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:08:51,792-Speed 3218.40 samples/sec Loss 2.5779 LearningRate 0.0109 Epoch: 13 Global Step: 166370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:08:54,907-Speed 3288.66 samples/sec Loss 2.5002 LearningRate 0.0109 Epoch: 13 Global Step: 166380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:08:57,980-Speed 3332.92 samples/sec Loss 2.5900 LearningRate 0.0109 Epoch: 13 Global Step: 166390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:01,079-Speed 3304.73 samples/sec Loss 2.5637 LearningRate 0.0109 Epoch: 13 Global Step: 166400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:04,161-Speed 3323.55 samples/sec Loss 2.5648 LearningRate 0.0109 Epoch: 13 Global Step: 166410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:07,294-Speed 3269.56 samples/sec Loss 2.5701 LearningRate 0.0109 Epoch: 13 Global Step: 166420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:10,391-Speed 3308.22 samples/sec Loss 2.5655 LearningRate 0.0109 Epoch: 13 Global Step: 166430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:13,511-Speed 3282.95 samples/sec Loss 2.5855 LearningRate 0.0109 Epoch: 13 Global Step: 166440 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:16,634-Speed 3279.60 samples/sec Loss 2.5498 LearningRate 0.0109 Epoch: 13 Global Step: 166450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:19,733-Speed 3304.80 samples/sec Loss 2.5285 LearningRate 0.0109 Epoch: 13 Global Step: 166460 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:22,826-Speed 3312.64 samples/sec Loss 2.6109 LearningRate 0.0109 Epoch: 13 Global Step: 166470 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:25,963-Speed 3265.09 samples/sec Loss 2.6344 LearningRate 0.0109 Epoch: 13 Global Step: 166480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:09:29,081-Speed 3284.96 samples/sec Loss 2.5982 LearningRate 0.0109 Epoch: 13 Global Step: 166490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:09:32,207-Speed 3277.14 samples/sec Loss 2.5486 LearningRate 0.0109 Epoch: 13 Global Step: 166500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:09:35,305-Speed 3306.80 samples/sec Loss 2.5507 LearningRate 0.0109 Epoch: 13 Global Step: 166510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:09:38,476-Speed 3229.89 samples/sec Loss 2.5519 LearningRate 0.0109 Epoch: 13 Global Step: 166520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:41,582-Speed 3297.89 samples/sec Loss 2.5330 LearningRate 0.0109 Epoch: 13 Global Step: 166530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:44,686-Speed 3298.96 samples/sec Loss 2.6060 LearningRate 0.0109 Epoch: 13 Global Step: 166540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:47,764-Speed 3328.36 samples/sec Loss 2.4996 LearningRate 0.0109 Epoch: 13 Global Step: 166550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:50,894-Speed 3272.71 samples/sec Loss 2.5575 LearningRate 0.0109 Epoch: 13 Global Step: 166560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:54,022-Speed 3274.35 samples/sec Loss 2.5718 LearningRate 0.0109 Epoch: 13 Global Step: 166570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:09:57,103-Speed 3324.40 samples/sec Loss 2.5333 LearningRate 0.0109 Epoch: 13 Global Step: 166580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:10:00,209-Speed 3298.44 samples/sec Loss 2.5257 LearningRate 0.0109 Epoch: 13 Global Step: 166590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:10:03,325-Speed 3287.28 samples/sec Loss 2.5530 LearningRate 0.0108 Epoch: 13 Global Step: 166600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:10:06,402-Speed 3328.42 samples/sec Loss 2.5480 LearningRate 0.0108 Epoch: 13 Global Step: 166610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:10:09,521-Speed 3284.27 samples/sec Loss 2.6016 LearningRate 0.0108 Epoch: 13 Global Step: 166620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:12,666-Speed 3257.14 samples/sec Loss 2.5540 LearningRate 0.0108 Epoch: 13 Global Step: 166630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:15,806-Speed 3262.16 samples/sec Loss 2.4816 LearningRate 0.0108 Epoch: 13 Global Step: 166640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:18,941-Speed 3267.44 samples/sec Loss 2.5814 LearningRate 0.0108 Epoch: 13 Global Step: 166650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:22,007-Speed 3341.10 samples/sec Loss 2.5258 LearningRate 0.0108 Epoch: 13 Global Step: 166660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:25,115-Speed 3295.50 samples/sec Loss 2.5291 LearningRate 0.0108 Epoch: 13 Global Step: 166670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:28,230-Speed 3289.31 samples/sec Loss 2.5167 LearningRate 0.0108 Epoch: 13 Global Step: 166680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:31,353-Speed 3279.40 samples/sec Loss 2.5752 LearningRate 0.0108 Epoch: 13 Global Step: 166690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:34,439-Speed 3319.82 samples/sec Loss 2.5723 LearningRate 0.0108 Epoch: 13 Global Step: 166700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:37,512-Speed 3332.89 samples/sec Loss 2.4298 LearningRate 0.0108 Epoch: 13 Global Step: 166710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:40,613-Speed 3303.32 samples/sec Loss 2.5311 LearningRate 0.0108 Epoch: 13 Global Step: 166720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 16:10:43,714-Speed 3303.34 samples/sec Loss 2.5199 LearningRate 0.0108 Epoch: 13 Global Step: 166730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:46,787-Speed 3332.67 samples/sec Loss 2.5172 LearningRate 0.0108 Epoch: 13 Global Step: 166740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:49,859-Speed 3335.13 samples/sec Loss 2.5454 LearningRate 0.0108 Epoch: 13 Global Step: 166750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:52,973-Speed 3289.34 samples/sec Loss 2.5665 LearningRate 0.0108 Epoch: 13 Global Step: 166760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:10:56,044-Speed 3335.20 samples/sec Loss 2.6055 LearningRate 0.0108 Epoch: 13 Global Step: 166770 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:10:59,132-Speed 3316.55 samples/sec Loss 2.5857 LearningRate 0.0108 Epoch: 13 Global Step: 166780 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:11:02,239-Speed 3297.56 samples/sec Loss 2.6290 LearningRate 0.0108 Epoch: 13 Global Step: 166790 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:05,345-Speed 3297.71 samples/sec Loss 2.5364 LearningRate 0.0108 Epoch: 13 Global Step: 166800 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:08,405-Speed 3347.02 samples/sec Loss 2.5116 LearningRate 0.0108 Epoch: 13 Global Step: 166810 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:11,497-Speed 3312.52 samples/sec Loss 2.4569 LearningRate 0.0108 Epoch: 13 Global Step: 166820 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:14,598-Speed 3303.54 samples/sec Loss 2.5718 LearningRate 0.0108 Epoch: 13 Global Step: 166830 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:17,722-Speed 3279.41 samples/sec Loss 2.4610 LearningRate 0.0108 Epoch: 13 Global Step: 166840 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:20,823-Speed 3302.23 samples/sec Loss 2.5493 LearningRate 0.0108 Epoch: 13 Global Step: 166850 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:23,951-Speed 3275.29 samples/sec Loss 2.5624 LearningRate 0.0108 Epoch: 13 Global Step: 166860 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:27,048-Speed 3307.90 samples/sec Loss 2.5459 LearningRate 0.0108 Epoch: 13 Global Step: 166870 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:30,136-Speed 3316.65 samples/sec Loss 2.5242 LearningRate 0.0108 Epoch: 13 Global Step: 166880 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:33,233-Speed 3307.08 samples/sec Loss 2.5791 LearningRate 0.0108 Epoch: 13 Global Step: 166890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:11:36,350-Speed 3286.33 samples/sec Loss 2.5157 LearningRate 0.0108 Epoch: 13 Global Step: 166900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:11:39,428-Speed 3328.48 samples/sec Loss 2.5373 LearningRate 0.0108 Epoch: 13 Global Step: 166910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:11:42,512-Speed 3321.32 samples/sec Loss 2.5459 LearningRate 0.0108 Epoch: 13 Global Step: 166920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:11:45,570-Speed 3349.20 samples/sec Loss 2.5029 LearningRate 0.0108 Epoch: 13 Global Step: 166930 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:48,667-Speed 3306.69 samples/sec Loss 2.6073 LearningRate 0.0108 Epoch: 13 Global Step: 166940 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:51,836-Speed 3232.57 samples/sec Loss 2.5611 LearningRate 0.0108 Epoch: 13 Global Step: 166950 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:54,955-Speed 3284.20 samples/sec Loss 2.6017 LearningRate 0.0108 Epoch: 13 Global Step: 166960 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:11:58,019-Speed 3342.80 samples/sec Loss 2.5776 LearningRate 0.0108 Epoch: 13 Global Step: 166970 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:12:01,116-Speed 3307.63 samples/sec Loss 2.5341 LearningRate 0.0107 Epoch: 13 Global Step: 166980 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:12:04,262-Speed 3256.38 samples/sec Loss 2.5826 LearningRate 0.0107 Epoch: 13 Global Step: 166990 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:12:07,364-Speed 3301.62 samples/sec Loss 2.6223 LearningRate 0.0107 Epoch: 13 Global Step: 167000 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:12:10,461-Speed 3308.13 samples/sec Loss 2.5278 LearningRate 0.0107 Epoch: 13 Global Step: 167010 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:12:13,589-Speed 3274.37 samples/sec Loss 2.5955 LearningRate 0.0107 Epoch: 13 Global Step: 167020 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:12:16,719-Speed 3272.55 samples/sec Loss 2.6278 LearningRate 0.0107 Epoch: 13 Global Step: 167030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:12:19,853-Speed 3268.12 samples/sec Loss 2.6699 LearningRate 0.0107 Epoch: 13 Global Step: 167040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:12:22,947-Speed 3310.71 samples/sec Loss 2.5705 LearningRate 0.0107 Epoch: 13 Global Step: 167050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:12:26,051-Speed 3300.86 samples/sec Loss 2.4827 LearningRate 0.0107 Epoch: 13 Global Step: 167060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:12:29,226-Speed 3225.71 samples/sec Loss 2.5775 LearningRate 0.0107 Epoch: 13 Global Step: 167070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:12:32,317-Speed 3313.75 samples/sec Loss 2.5573 LearningRate 0.0107 Epoch: 13 Global Step: 167080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:12:35,407-Speed 3315.47 samples/sec Loss 2.6149 LearningRate 0.0107 Epoch: 13 Global Step: 167090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:12:38,561-Speed 3247.84 samples/sec Loss 2.4901 LearningRate 0.0107 Epoch: 13 Global Step: 167100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:12:41,654-Speed 3311.55 samples/sec Loss 2.4798 LearningRate 0.0107 Epoch: 13 Global Step: 167110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:12:45,378-Speed 2750.03 samples/sec Loss 2.5339 LearningRate 0.0107 Epoch: 13 Global Step: 167120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:12:48,483-Speed 3299.18 samples/sec Loss 2.5459 LearningRate 0.0107 Epoch: 13 Global Step: 167130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:12:51,601-Speed 3285.19 samples/sec Loss 2.5229 LearningRate 0.0107 Epoch: 13 Global Step: 167140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:12:54,770-Speed 3232.71 samples/sec Loss 2.5535 LearningRate 0.0107 Epoch: 13 Global Step: 167150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:12:57,871-Speed 3303.65 samples/sec Loss 2.5507 LearningRate 0.0107 Epoch: 13 Global Step: 167160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:13:00,985-Speed 3288.40 samples/sec Loss 2.5589 LearningRate 0.0107 Epoch: 13 Global Step: 167170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:04,081-Speed 3309.58 samples/sec Loss 2.5432 LearningRate 0.0107 Epoch: 13 Global Step: 167180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:07,206-Speed 3277.10 samples/sec Loss 2.5420 LearningRate 0.0107 Epoch: 13 Global Step: 167190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:10,273-Speed 3340.14 samples/sec Loss 2.5783 LearningRate 0.0107 Epoch: 13 Global Step: 167200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:13,383-Speed 3293.43 samples/sec Loss 2.6010 LearningRate 0.0107 Epoch: 13 Global Step: 167210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:16,468-Speed 3321.08 samples/sec Loss 2.5219 LearningRate 0.0107 Epoch: 13 Global Step: 167220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:19,600-Speed 3269.60 samples/sec Loss 2.5886 LearningRate 0.0107 Epoch: 13 Global Step: 167230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:22,694-Speed 3311.30 samples/sec Loss 2.5932 LearningRate 0.0107 Epoch: 13 Global Step: 167240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:25,770-Speed 3330.05 samples/sec Loss 2.5745 LearningRate 0.0107 Epoch: 13 Global Step: 167250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:28,853-Speed 3322.24 samples/sec Loss 2.5030 LearningRate 0.0107 Epoch: 13 Global Step: 167260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:31,965-Speed 3292.27 samples/sec Loss 2.5387 LearningRate 0.0107 Epoch: 13 Global Step: 167270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:35,111-Speed 3255.41 samples/sec Loss 2.6085 LearningRate 0.0107 Epoch: 13 Global Step: 167280 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:38,225-Speed 3289.48 samples/sec Loss 2.5936 LearningRate 0.0107 Epoch: 13 Global Step: 167290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:41,363-Speed 3264.83 samples/sec Loss 2.5167 LearningRate 0.0107 Epoch: 13 Global Step: 167300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:44,454-Speed 3313.60 samples/sec Loss 2.4925 LearningRate 0.0107 Epoch: 13 Global Step: 167310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:47,587-Speed 3269.57 samples/sec Loss 2.6063 LearningRate 0.0107 Epoch: 13 Global Step: 167320 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:50,760-Speed 3227.53 samples/sec Loss 2.6203 LearningRate 0.0107 Epoch: 13 Global Step: 167330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:53,876-Speed 3287.61 samples/sec Loss 2.5161 LearningRate 0.0107 Epoch: 13 Global Step: 167340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:13:56,974-Speed 3305.86 samples/sec Loss 2.5914 LearningRate 0.0106 Epoch: 13 Global Step: 167350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:14:00,077-Speed 3300.78 samples/sec Loss 2.4960 LearningRate 0.0106 Epoch: 13 Global Step: 167360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:14:03,270-Speed 3208.71 samples/sec Loss 2.6224 LearningRate 0.0106 Epoch: 13 Global Step: 167370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:06,360-Speed 3315.03 samples/sec Loss 2.4640 LearningRate 0.0106 Epoch: 13 Global Step: 167380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:09,425-Speed 3341.85 samples/sec Loss 2.6434 LearningRate 0.0106 Epoch: 13 Global Step: 167390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:12,562-Speed 3265.24 samples/sec Loss 2.5591 LearningRate 0.0106 Epoch: 13 Global Step: 167400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:16,248-Speed 2779.10 samples/sec Loss 2.5727 LearningRate 0.0106 Epoch: 13 Global Step: 167410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:19,350-Speed 3302.13 samples/sec Loss 2.5877 LearningRate 0.0106 Epoch: 13 Global Step: 167420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:22,427-Speed 3328.75 samples/sec Loss 2.5710 LearningRate 0.0106 Epoch: 13 Global Step: 167430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:27,564-Speed 1993.75 samples/sec Loss 2.5818 LearningRate 0.0106 Epoch: 13 Global Step: 167440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:30,726-Speed 3239.30 samples/sec Loss 2.5952 LearningRate 0.0106 Epoch: 13 Global Step: 167450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:33,800-Speed 3333.33 samples/sec Loss 2.5645 LearningRate 0.0106 Epoch: 13 Global Step: 167460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:36,934-Speed 3267.95 samples/sec Loss 2.5161 LearningRate 0.0106 Epoch: 13 Global Step: 167470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 16:14:40,003-Speed 3338.12 samples/sec Loss 2.5376 LearningRate 0.0106 Epoch: 13 Global Step: 167480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:43,088-Speed 3320.23 samples/sec Loss 2.6279 LearningRate 0.0106 Epoch: 13 Global Step: 167490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:46,170-Speed 3323.77 samples/sec Loss 2.5710 LearningRate 0.0106 Epoch: 13 Global Step: 167500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:14:49,245-Speed 3330.43 samples/sec Loss 2.5698 LearningRate 0.0106 Epoch: 13 Global Step: 167510 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:14:52,342-Speed 3308.07 samples/sec Loss 2.5490 LearningRate 0.0106 Epoch: 13 Global Step: 167520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:14:55,454-Speed 3292.01 samples/sec Loss 2.5543 LearningRate 0.0106 Epoch: 13 Global Step: 167530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:14:58,521-Speed 3339.12 samples/sec Loss 2.5712 LearningRate 0.0106 Epoch: 13 Global Step: 167540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:15:01,634-Speed 3290.23 samples/sec Loss 2.5265 LearningRate 0.0106 Epoch: 13 Global Step: 167550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:15:04,743-Speed 3295.24 samples/sec Loss 2.6089 LearningRate 0.0106 Epoch: 13 Global Step: 167560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:15:07,880-Speed 3264.92 samples/sec Loss 2.5564 LearningRate 0.0106 Epoch: 13 Global Step: 167570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:15:10,964-Speed 3321.08 samples/sec Loss 2.5852 LearningRate 0.0106 Epoch: 13 Global Step: 167580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:15:14,146-Speed 3219.07 samples/sec Loss 2.6136 LearningRate 0.0106 Epoch: 13 Global Step: 167590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:15:17,229-Speed 3322.17 samples/sec Loss 2.4999 LearningRate 0.0106 Epoch: 13 Global Step: 167600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:15:20,334-Speed 3299.71 samples/sec Loss 2.5368 LearningRate 0.0106 Epoch: 13 Global Step: 167610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:15:23,401-Speed 3339.95 samples/sec Loss 2.4593 LearningRate 0.0106 Epoch: 13 Global Step: 167620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:15:26,513-Speed 3291.87 samples/sec Loss 2.5262 LearningRate 0.0106 Epoch: 13 Global Step: 167630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:15:29,607-Speed 3310.64 samples/sec Loss 2.5907 LearningRate 0.0106 Epoch: 13 Global Step: 167640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:15:32,672-Speed 3341.96 samples/sec Loss 2.5791 LearningRate 0.0106 Epoch: 13 Global Step: 167650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:15:35,757-Speed 3320.29 samples/sec Loss 2.6076 LearningRate 0.0106 Epoch: 13 Global Step: 167660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:15:38,840-Speed 3322.59 samples/sec Loss 2.6345 LearningRate 0.0106 Epoch: 13 Global Step: 167670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:15:41,982-Speed 3259.63 samples/sec Loss 2.5863 LearningRate 0.0106 Epoch: 13 Global Step: 167680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:15:45,025-Speed 3367.11 samples/sec Loss 2.4686 LearningRate 0.0106 Epoch: 13 Global Step: 167690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:15:48,100-Speed 3330.66 samples/sec Loss 2.5679 LearningRate 0.0106 Epoch: 13 Global Step: 167700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:15:51,148-Speed 3360.86 samples/sec Loss 2.5982 LearningRate 0.0106 Epoch: 13 Global Step: 167710 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:15:54,228-Speed 3325.55 samples/sec Loss 2.5608 LearningRate 0.0106 Epoch: 13 Global Step: 167720 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:15:57,283-Speed 3353.50 samples/sec Loss 2.6408 LearningRate 0.0106 Epoch: 13 Global Step: 167730 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:16:00,468-Speed 3215.80 samples/sec Loss 2.5352 LearningRate 0.0105 Epoch: 13 Global Step: 167740 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:16:03,584-Speed 3287.63 samples/sec Loss 2.6016 LearningRate 0.0105 Epoch: 13 Global Step: 167750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:16:06,716-Speed 3270.23 samples/sec Loss 2.5853 LearningRate 0.0105 Epoch: 13 Global Step: 167760 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:16:09,784-Speed 3339.28 samples/sec Loss 2.4769 LearningRate 0.0105 Epoch: 13 Global Step: 167770 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:16:12,859-Speed 3330.42 samples/sec Loss 2.5922 LearningRate 0.0105 Epoch: 13 Global Step: 167780 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:16:15,940-Speed 3325.04 samples/sec Loss 2.6095 LearningRate 0.0105 Epoch: 13 Global Step: 167790 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:16:19,052-Speed 3291.98 samples/sec Loss 2.6424 LearningRate 0.0105 Epoch: 13 Global Step: 167800 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:16:22,117-Speed 3341.38 samples/sec Loss 2.5869 LearningRate 0.0105 Epoch: 13 Global Step: 167810 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:16:25,227-Speed 3294.17 samples/sec Loss 2.6689 LearningRate 0.0105 Epoch: 13 Global Step: 167820 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:16:28,325-Speed 3306.55 samples/sec Loss 2.6424 LearningRate 0.0105 Epoch: 13 Global Step: 167830 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:16:31,450-Speed 3277.16 samples/sec Loss 2.6341 LearningRate 0.0105 Epoch: 13 Global Step: 167840 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:16:34,561-Speed 3293.13 samples/sec Loss 2.6111 LearningRate 0.0105 Epoch: 13 Global Step: 167850 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:16:37,691-Speed 3272.70 samples/sec Loss 2.6788 LearningRate 0.0105 Epoch: 13 Global Step: 167860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:16:40,818-Speed 3275.98 samples/sec Loss 2.6193 LearningRate 0.0105 Epoch: 13 Global Step: 167870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:16:43,886-Speed 3337.91 samples/sec Loss 2.6048 LearningRate 0.0105 Epoch: 13 Global Step: 167880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:16:47,010-Speed 3279.05 samples/sec Loss 2.5623 LearningRate 0.0105 Epoch: 13 Global Step: 167890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:16:50,127-Speed 3286.87 samples/sec Loss 2.6570 LearningRate 0.0105 Epoch: 13 Global Step: 167900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:16:53,297-Speed 3231.31 samples/sec Loss 2.5310 LearningRate 0.0105 Epoch: 13 Global Step: 167910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:16:56,453-Speed 3245.88 samples/sec Loss 2.5856 LearningRate 0.0105 Epoch: 13 Global Step: 167920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:16:59,561-Speed 3295.76 samples/sec Loss 2.6215 LearningRate 0.0105 Epoch: 13 Global Step: 167930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:17:02,651-Speed 3314.75 samples/sec Loss 2.5283 LearningRate 0.0105 Epoch: 13 Global Step: 167940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:17:05,752-Speed 3303.88 samples/sec Loss 2.5675 LearningRate 0.0105 Epoch: 13 Global Step: 167950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:17:08,846-Speed 3309.95 samples/sec Loss 2.5999 LearningRate 0.0105 Epoch: 13 Global Step: 167960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:17:11,923-Speed 3329.52 samples/sec Loss 2.5127 LearningRate 0.0105 Epoch: 13 Global Step: 167970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:17:15,060-Speed 3265.02 samples/sec Loss 2.5544 LearningRate 0.0105 Epoch: 13 Global Step: 167980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:17:18,163-Speed 3301.50 samples/sec Loss 2.4990 LearningRate 0.0105 Epoch: 13 Global Step: 167990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:17:21,223-Speed 3347.02 samples/sec Loss 2.5470 LearningRate 0.0105 Epoch: 13 Global Step: 168000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:17:24,359-Speed 3266.09 samples/sec Loss 2.6281 LearningRate 0.0105 Epoch: 13 Global Step: 168010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:17:27,487-Speed 3274.99 samples/sec Loss 2.5611 LearningRate 0.0105 Epoch: 13 Global Step: 168020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:17:30,577-Speed 3315.06 samples/sec Loss 2.6287 LearningRate 0.0105 Epoch: 13 Global Step: 168030 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:17:33,637-Speed 3347.10 samples/sec Loss 2.5728 LearningRate 0.0105 Epoch: 13 Global Step: 168040 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:17:36,826-Speed 3212.40 samples/sec Loss 2.6096 LearningRate 0.0105 Epoch: 13 Global Step: 168050 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:17:39,912-Speed 3319.45 samples/sec Loss 2.6272 LearningRate 0.0105 Epoch: 13 Global Step: 168060 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:17:43,077-Speed 3236.16 samples/sec Loss 2.5416 LearningRate 0.0105 Epoch: 13 Global Step: 168070 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:17:46,153-Speed 3330.11 samples/sec Loss 2.5616 LearningRate 0.0105 Epoch: 13 Global Step: 168080 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:17:49,260-Speed 3295.87 samples/sec Loss 2.6029 LearningRate 0.0105 Epoch: 13 Global Step: 168090 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:17:52,359-Speed 3306.34 samples/sec Loss 2.5990 LearningRate 0.0105 Epoch: 13 Global Step: 168100 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:17:55,483-Speed 3278.54 samples/sec Loss 2.6027 LearningRate 0.0105 Epoch: 13 Global Step: 168110 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:17:58,563-Speed 3325.18 samples/sec Loss 2.6627 LearningRate 0.0104 Epoch: 13 Global Step: 168120 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:18:01,638-Speed 3332.09 samples/sec Loss 2.5098 LearningRate 0.0104 Epoch: 13 Global Step: 168130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:18:04,741-Speed 3301.06 samples/sec Loss 2.6051 LearningRate 0.0104 Epoch: 13 Global Step: 168140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:18:07,840-Speed 3304.69 samples/sec Loss 2.5771 LearningRate 0.0104 Epoch: 13 Global Step: 168150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:18:10,949-Speed 3295.12 samples/sec Loss 2.5436 LearningRate 0.0104 Epoch: 13 Global Step: 168160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:18:14,131-Speed 3219.16 samples/sec Loss 2.5922 LearningRate 0.0104 Epoch: 13 Global Step: 168170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:18:17,371-Speed 3161.19 samples/sec Loss 2.5844 LearningRate 0.0104 Epoch: 13 Global Step: 168180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:18:20,467-Speed 3308.58 samples/sec Loss 2.6896 LearningRate 0.0104 Epoch: 13 Global Step: 168190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:18:23,586-Speed 3284.86 samples/sec Loss 2.5940 LearningRate 0.0104 Epoch: 13 Global Step: 168200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:18:26,739-Speed 3248.27 samples/sec Loss 2.6035 LearningRate 0.0104 Epoch: 13 Global Step: 168210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:18:29,868-Speed 3273.73 samples/sec Loss 2.6021 LearningRate 0.0104 Epoch: 13 Global Step: 168220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:18:32,952-Speed 3320.85 samples/sec Loss 2.5832 LearningRate 0.0104 Epoch: 13 Global Step: 168230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:18:36,148-Speed 3205.46 samples/sec Loss 2.6376 LearningRate 0.0104 Epoch: 13 Global Step: 168240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:18:39,298-Speed 3252.11 samples/sec Loss 2.5737 LearningRate 0.0104 Epoch: 13 Global Step: 168250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:18:42,394-Speed 3307.60 samples/sec Loss 2.6131 LearningRate 0.0104 Epoch: 13 Global Step: 168260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:18:45,482-Speed 3317.20 samples/sec Loss 2.5887 LearningRate 0.0104 Epoch: 13 Global Step: 168270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:18:48,558-Speed 3330.76 samples/sec Loss 2.6816 LearningRate 0.0104 Epoch: 13 Global Step: 168280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:18:51,689-Speed 3271.07 samples/sec Loss 2.5708 LearningRate 0.0104 Epoch: 13 Global Step: 168290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:18:54,892-Speed 3198.19 samples/sec Loss 2.5586 LearningRate 0.0104 Epoch: 13 Global Step: 168300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:18:57,990-Speed 3306.15 samples/sec Loss 2.5471 LearningRate 0.0104 Epoch: 13 Global Step: 168310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:01,131-Speed 3260.66 samples/sec Loss 2.5593 LearningRate 0.0104 Epoch: 13 Global Step: 168320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:04,222-Speed 3314.35 samples/sec Loss 2.6604 LearningRate 0.0104 Epoch: 13 Global Step: 168330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:07,333-Speed 3292.00 samples/sec Loss 2.6082 LearningRate 0.0104 Epoch: 13 Global Step: 168340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:10,466-Speed 3269.29 samples/sec Loss 2.5892 LearningRate 0.0104 Epoch: 13 Global Step: 168350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:13,663-Speed 3204.35 samples/sec Loss 2.5809 LearningRate 0.0104 Epoch: 13 Global Step: 168360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:16,856-Speed 3208.20 samples/sec Loss 2.5737 LearningRate 0.0104 Epoch: 13 Global Step: 168370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:20,006-Speed 3251.96 samples/sec Loss 2.6921 LearningRate 0.0104 Epoch: 13 Global Step: 168380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:23,128-Speed 3280.30 samples/sec Loss 2.6160 LearningRate 0.0104 Epoch: 13 Global Step: 168390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:26,263-Speed 3267.71 samples/sec Loss 2.6215 LearningRate 0.0104 Epoch: 13 Global Step: 168400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:29,378-Speed 3288.36 samples/sec Loss 2.5737 LearningRate 0.0104 Epoch: 13 Global Step: 168410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:32,447-Speed 3337.85 samples/sec Loss 2.6126 LearningRate 0.0104 Epoch: 13 Global Step: 168420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:35,555-Speed 3295.31 samples/sec Loss 2.6314 LearningRate 0.0104 Epoch: 13 Global Step: 168430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:38,658-Speed 3301.25 samples/sec Loss 2.6783 LearningRate 0.0104 Epoch: 13 Global Step: 168440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:41,814-Speed 3245.16 samples/sec Loss 2.6192 LearningRate 0.0104 Epoch: 13 Global Step: 168450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:44,905-Speed 3314.51 samples/sec Loss 2.6147 LearningRate 0.0104 Epoch: 13 Global Step: 168460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:48,054-Speed 3252.17 samples/sec Loss 2.5667 LearningRate 0.0104 Epoch: 13 Global Step: 168470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:51,182-Speed 3274.98 samples/sec Loss 2.6529 LearningRate 0.0104 Epoch: 13 Global Step: 168480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:54,282-Speed 3304.57 samples/sec Loss 2.6437 LearningRate 0.0104 Epoch: 13 Global Step: 168490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:19:57,364-Speed 3323.57 samples/sec Loss 2.6065 LearningRate 0.0103 Epoch: 13 Global Step: 168500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:20:00,487-Speed 3279.40 samples/sec Loss 2.6217 LearningRate 0.0103 Epoch: 13 Global Step: 168510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:20:03,582-Speed 3309.00 samples/sec Loss 2.5329 LearningRate 0.0103 Epoch: 13 Global Step: 168520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:20:06,735-Speed 3249.10 samples/sec Loss 2.6577 LearningRate 0.0103 Epoch: 13 Global Step: 168530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 16:20:09,805-Speed 3336.35 samples/sec Loss 2.5307 LearningRate 0.0103 Epoch: 13 Global Step: 168540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 16:20:12,920-Speed 3288.99 samples/sec Loss 2.5768 LearningRate 0.0103 Epoch: 13 Global Step: 168550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 16:20:16,041-Speed 3282.09 samples/sec Loss 2.5568 LearningRate 0.0103 Epoch: 13 Global Step: 168560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:20:19,187-Speed 3255.59 samples/sec Loss 2.5740 LearningRate 0.0103 Epoch: 13 Global Step: 168570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:20:22,256-Speed 3337.76 samples/sec Loss 2.5885 LearningRate 0.0103 Epoch: 13 Global Step: 168580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:20:25,381-Speed 3277.52 samples/sec Loss 2.6147 LearningRate 0.0103 Epoch: 13 Global Step: 168590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:20:28,528-Speed 3255.39 samples/sec Loss 2.6070 LearningRate 0.0103 Epoch: 13 Global Step: 168600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:20:31,621-Speed 3310.92 samples/sec Loss 2.6466 LearningRate 0.0103 Epoch: 13 Global Step: 168610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:20:34,739-Speed 3286.01 samples/sec Loss 2.6277 LearningRate 0.0103 Epoch: 13 Global Step: 168620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:20:37,829-Speed 3314.13 samples/sec Loss 2.5548 LearningRate 0.0103 Epoch: 13 Global Step: 168630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:20:41,028-Speed 3201.79 samples/sec Loss 2.5911 LearningRate 0.0103 Epoch: 13 Global Step: 168640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:20:44,134-Speed 3298.73 samples/sec Loss 2.6875 LearningRate 0.0103 Epoch: 13 Global Step: 168650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:20:47,248-Speed 3289.58 samples/sec Loss 2.6189 LearningRate 0.0103 Epoch: 13 Global Step: 168660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:20:50,364-Speed 3286.98 samples/sec Loss 2.6201 LearningRate 0.0103 Epoch: 13 Global Step: 168670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:20:53,484-Speed 3282.55 samples/sec Loss 2.5963 LearningRate 0.0103 Epoch: 13 Global Step: 168680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:20:56,557-Speed 3333.81 samples/sec Loss 2.6139 LearningRate 0.0103 Epoch: 13 Global Step: 168690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:20:59,629-Speed 3334.16 samples/sec Loss 2.6358 LearningRate 0.0103 Epoch: 13 Global Step: 168700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:21:02,729-Speed 3304.95 samples/sec Loss 2.5957 LearningRate 0.0103 Epoch: 13 Global Step: 168710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:21:05,846-Speed 3286.27 samples/sec Loss 2.5674 LearningRate 0.0103 Epoch: 13 Global Step: 168720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:21:08,950-Speed 3299.01 samples/sec Loss 2.5730 LearningRate 0.0103 Epoch: 13 Global Step: 168730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:21:12,043-Speed 3311.84 samples/sec Loss 2.6473 LearningRate 0.0103 Epoch: 13 Global Step: 168740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:21:15,206-Speed 3239.05 samples/sec Loss 2.5789 LearningRate 0.0103 Epoch: 13 Global Step: 168750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:21:18,309-Speed 3300.82 samples/sec Loss 2.5993 LearningRate 0.0103 Epoch: 13 Global Step: 168760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 16:21:21,375-Speed 3340.76 samples/sec Loss 2.7084 LearningRate 0.0103 Epoch: 13 Global Step: 168770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:21:24,478-Speed 3301.63 samples/sec Loss 2.5872 LearningRate 0.0103 Epoch: 13 Global Step: 168780 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:21:27,595-Speed 3285.79 samples/sec Loss 2.5236 LearningRate 0.0103 Epoch: 13 Global Step: 168790 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:21:30,734-Speed 3263.50 samples/sec Loss 2.5832 LearningRate 0.0103 Epoch: 13 Global Step: 168800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:21:33,879-Speed 3257.14 samples/sec Loss 2.5675 LearningRate 0.0103 Epoch: 13 Global Step: 168810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:21:37,030-Speed 3250.56 samples/sec Loss 2.5771 LearningRate 0.0103 Epoch: 13 Global Step: 168820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:21:40,154-Speed 3278.44 samples/sec Loss 2.5916 LearningRate 0.0103 Epoch: 13 Global Step: 168830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:21:43,323-Speed 3233.00 samples/sec Loss 2.5956 LearningRate 0.0103 Epoch: 13 Global Step: 168840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:21:46,459-Speed 3266.37 samples/sec Loss 2.6245 LearningRate 0.0103 Epoch: 13 Global Step: 168850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:21:49,593-Speed 3268.04 samples/sec Loss 2.6055 LearningRate 0.0103 Epoch: 13 Global Step: 168860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:21:52,760-Speed 3234.06 samples/sec Loss 2.6098 LearningRate 0.0103 Epoch: 13 Global Step: 168870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:21:55,852-Speed 3313.50 samples/sec Loss 2.6307 LearningRate 0.0103 Epoch: 13 Global Step: 168880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:21:58,945-Speed 3310.90 samples/sec Loss 2.6119 LearningRate 0.0102 Epoch: 13 Global Step: 168890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:22:02,017-Speed 3334.79 samples/sec Loss 2.5553 LearningRate 0.0102 Epoch: 13 Global Step: 168900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:22:05,119-Speed 3301.52 samples/sec Loss 2.5783 LearningRate 0.0102 Epoch: 13 Global Step: 168910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:22:08,180-Speed 3346.48 samples/sec Loss 2.6466 LearningRate 0.0102 Epoch: 13 Global Step: 168920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:22:11,285-Speed 3298.85 samples/sec Loss 2.5790 LearningRate 0.0102 Epoch: 13 Global Step: 168930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:22:14,493-Speed 3193.41 samples/sec Loss 2.5735 LearningRate 0.0102 Epoch: 13 Global Step: 168940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:22:17,578-Speed 3320.00 samples/sec Loss 2.6253 LearningRate 0.0102 Epoch: 13 Global Step: 168950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:22:20,677-Speed 3305.29 samples/sec Loss 2.5656 LearningRate 0.0102 Epoch: 13 Global Step: 168960 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:22:23,803-Speed 3277.21 samples/sec Loss 2.5891 LearningRate 0.0102 Epoch: 13 Global Step: 168970 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:22:26,884-Speed 3324.95 samples/sec Loss 2.6338 LearningRate 0.0102 Epoch: 13 Global Step: 168980 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:22:30,013-Speed 3273.53 samples/sec Loss 2.5714 LearningRate 0.0102 Epoch: 13 Global Step: 168990 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:22:33,063-Speed 3358.26 samples/sec Loss 2.6191 LearningRate 0.0102 Epoch: 13 Global Step: 169000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:22:36,164-Speed 3303.77 samples/sec Loss 2.6711 LearningRate 0.0102 Epoch: 13 Global Step: 169010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:22:39,296-Speed 3269.86 samples/sec Loss 2.6403 LearningRate 0.0102 Epoch: 13 Global Step: 169020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:22:42,380-Speed 3321.78 samples/sec Loss 2.5482 LearningRate 0.0102 Epoch: 13 Global Step: 169030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:22:45,451-Speed 3335.31 samples/sec Loss 2.5261 LearningRate 0.0102 Epoch: 13 Global Step: 169040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:22:48,620-Speed 3232.39 samples/sec Loss 2.5888 LearningRate 0.0102 Epoch: 13 Global Step: 169050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:22:51,741-Speed 3282.16 samples/sec Loss 2.5579 LearningRate 0.0102 Epoch: 13 Global Step: 169060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:22:54,872-Speed 3271.29 samples/sec Loss 2.6131 LearningRate 0.0102 Epoch: 13 Global Step: 169070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:22:57,947-Speed 3331.67 samples/sec Loss 2.6225 LearningRate 0.0102 Epoch: 13 Global Step: 169080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:23:01,139-Speed 3208.46 samples/sec Loss 2.5493 LearningRate 0.0102 Epoch: 13 Global Step: 169090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:23:04,225-Speed 3319.48 samples/sec Loss 2.5684 LearningRate 0.0102 Epoch: 13 Global Step: 169100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:23:07,411-Speed 3215.55 samples/sec Loss 2.6116 LearningRate 0.0102 Epoch: 13 Global Step: 169110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:23:10,477-Speed 3341.08 samples/sec Loss 2.5619 LearningRate 0.0102 Epoch: 13 Global Step: 169120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:23:13,577-Speed 3304.21 samples/sec Loss 2.6387 LearningRate 0.0102 Epoch: 13 Global Step: 169130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:23:16,650-Speed 3333.47 samples/sec Loss 2.5783 LearningRate 0.0102 Epoch: 13 Global Step: 169140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:23:19,760-Speed 3293.07 samples/sec Loss 2.6016 LearningRate 0.0102 Epoch: 13 Global Step: 169150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:23:22,879-Speed 3284.32 samples/sec Loss 2.6553 LearningRate 0.0102 Epoch: 13 Global Step: 169160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:23:26,096-Speed 3183.62 samples/sec Loss 2.6454 LearningRate 0.0102 Epoch: 13 Global Step: 169170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:23:29,188-Speed 3312.77 samples/sec Loss 2.6510 LearningRate 0.0102 Epoch: 13 Global Step: 169180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:23:32,272-Speed 3322.28 samples/sec Loss 2.5840 LearningRate 0.0102 Epoch: 13 Global Step: 169190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:23:35,346-Speed 3332.10 samples/sec Loss 2.6402 LearningRate 0.0102 Epoch: 13 Global Step: 169200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:23:38,424-Speed 3327.75 samples/sec Loss 2.6291 LearningRate 0.0102 Epoch: 13 Global Step: 169210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:23:41,541-Speed 3285.61 samples/sec Loss 2.5966 LearningRate 0.0102 Epoch: 13 Global Step: 169220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:23:44,608-Speed 3340.62 samples/sec Loss 2.6032 LearningRate 0.0102 Epoch: 13 Global Step: 169230 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:23:47,723-Speed 3287.97 samples/sec Loss 2.6654 LearningRate 0.0102 Epoch: 13 Global Step: 169240 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:23:50,846-Speed 3280.25 samples/sec Loss 2.5810 LearningRate 0.0102 Epoch: 13 Global Step: 169250 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:23:53,985-Speed 3263.29 samples/sec Loss 2.6252 LearningRate 0.0102 Epoch: 13 Global Step: 169260 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:23:57,039-Speed 3353.81 samples/sec Loss 2.6723 LearningRate 0.0102 Epoch: 13 Global Step: 169270 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:00,193-Speed 3248.36 samples/sec Loss 2.6064 LearningRate 0.0101 Epoch: 13 Global Step: 169280 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:03,338-Speed 3256.24 samples/sec Loss 2.7093 LearningRate 0.0101 Epoch: 13 Global Step: 169290 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:06,503-Speed 3236.49 samples/sec Loss 2.5879 LearningRate 0.0101 Epoch: 13 Global Step: 169300 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:09,604-Speed 3303.58 samples/sec Loss 2.6667 LearningRate 0.0101 Epoch: 13 Global Step: 169310 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:12,678-Speed 3332.23 samples/sec Loss 2.6968 LearningRate 0.0101 Epoch: 13 Global Step: 169320 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:15,883-Speed 3195.73 samples/sec Loss 2.6396 LearningRate 0.0101 Epoch: 13 Global Step: 169330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:24:18,942-Speed 3348.92 samples/sec Loss 2.6141 LearningRate 0.0101 Epoch: 13 Global Step: 169340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:24:22,078-Speed 3265.69 samples/sec Loss 2.6382 LearningRate 0.0101 Epoch: 13 Global Step: 169350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:24:25,276-Speed 3203.30 samples/sec Loss 2.5566 LearningRate 0.0101 Epoch: 13 Global Step: 169360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:24:28,507-Speed 3169.85 samples/sec Loss 2.6310 LearningRate 0.0101 Epoch: 13 Global Step: 169370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:24:31,659-Speed 3250.41 samples/sec Loss 2.6249 LearningRate 0.0101 Epoch: 13 Global Step: 169380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:24:34,831-Speed 3228.79 samples/sec Loss 2.5223 LearningRate 0.0101 Epoch: 13 Global Step: 169390 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:37,997-Speed 3235.53 samples/sec Loss 2.5641 LearningRate 0.0101 Epoch: 13 Global Step: 169400 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:41,089-Speed 3312.40 samples/sec Loss 2.5721 LearningRate 0.0101 Epoch: 13 Global Step: 169410 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:44,174-Speed 3320.71 samples/sec Loss 2.5938 LearningRate 0.0101 Epoch: 13 Global Step: 169420 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:47,260-Speed 3319.70 samples/sec Loss 2.6359 LearningRate 0.0101 Epoch: 13 Global Step: 169430 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:50,398-Speed 3264.50 samples/sec Loss 2.5812 LearningRate 0.0101 Epoch: 13 Global Step: 169440 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:53,559-Speed 3240.44 samples/sec Loss 2.6146 LearningRate 0.0101 Epoch: 13 Global Step: 169450 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:56,693-Speed 3268.38 samples/sec Loss 2.6430 LearningRate 0.0101 Epoch: 13 Global Step: 169460 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:24:59,827-Speed 3268.34 samples/sec Loss 2.6633 LearningRate 0.0101 Epoch: 13 Global Step: 169470 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:25:02,942-Speed 3288.42 samples/sec Loss 2.6145 LearningRate 0.0101 Epoch: 13 Global Step: 169480 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:25:06,070-Speed 3274.94 samples/sec Loss 2.6013 LearningRate 0.0101 Epoch: 13 Global Step: 169490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:09,163-Speed 3311.63 samples/sec Loss 2.6129 LearningRate 0.0101 Epoch: 13 Global Step: 169500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:12,319-Speed 3245.09 samples/sec Loss 2.6404 LearningRate 0.0101 Epoch: 13 Global Step: 169510 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:15,489-Speed 3231.49 samples/sec Loss 2.6062 LearningRate 0.0101 Epoch: 13 Global Step: 169520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:18,585-Speed 3308.18 samples/sec Loss 2.6382 LearningRate 0.0101 Epoch: 13 Global Step: 169530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:21,672-Speed 3318.27 samples/sec Loss 2.5101 LearningRate 0.0101 Epoch: 13 Global Step: 169540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:24,777-Speed 3299.01 samples/sec Loss 2.6188 LearningRate 0.0101 Epoch: 13 Global Step: 169550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:27,922-Speed 3257.05 samples/sec Loss 2.5977 LearningRate 0.0101 Epoch: 13 Global Step: 169560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:31,036-Speed 3289.32 samples/sec Loss 2.6076 LearningRate 0.0101 Epoch: 13 Global Step: 169570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:34,172-Speed 3266.13 samples/sec Loss 2.6376 LearningRate 0.0101 Epoch: 13 Global Step: 169580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:37,287-Speed 3289.11 samples/sec Loss 2.5824 LearningRate 0.0101 Epoch: 13 Global Step: 169590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:25:40,511-Speed 3177.25 samples/sec Loss 2.6315 LearningRate 0.0101 Epoch: 13 Global Step: 169600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:43,649-Speed 3264.46 samples/sec Loss 2.6269 LearningRate 0.0101 Epoch: 13 Global Step: 169610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:46,729-Speed 3325.00 samples/sec Loss 2.6325 LearningRate 0.0101 Epoch: 13 Global Step: 169620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:49,816-Speed 3319.14 samples/sec Loss 2.6560 LearningRate 0.0101 Epoch: 13 Global Step: 169630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:52,898-Speed 3322.79 samples/sec Loss 2.5736 LearningRate 0.0101 Epoch: 13 Global Step: 169640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:55,966-Speed 3339.47 samples/sec Loss 2.5566 LearningRate 0.0101 Epoch: 13 Global Step: 169650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:25:59,041-Speed 3331.12 samples/sec Loss 2.5695 LearningRate 0.0101 Epoch: 13 Global Step: 169660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:26:02,199-Speed 3243.63 samples/sec Loss 2.5677 LearningRate 0.0100 Epoch: 13 Global Step: 169670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:26:05,273-Speed 3332.31 samples/sec Loss 2.5782 LearningRate 0.0100 Epoch: 13 Global Step: 169680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:26:08,361-Speed 3316.73 samples/sec Loss 2.6170 LearningRate 0.0100 Epoch: 13 Global Step: 169690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:26:11,515-Speed 3247.82 samples/sec Loss 2.6179 LearningRate 0.0100 Epoch: 13 Global Step: 169700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:26:14,610-Speed 3309.62 samples/sec Loss 2.6083 LearningRate 0.0100 Epoch: 13 Global Step: 169710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:26:17,700-Speed 3314.83 samples/sec Loss 2.5279 LearningRate 0.0100 Epoch: 13 Global Step: 169720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:26:20,769-Speed 3338.25 samples/sec Loss 2.6556 LearningRate 0.0100 Epoch: 13 Global Step: 169730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:26:23,847-Speed 3327.84 samples/sec Loss 2.6074 LearningRate 0.0100 Epoch: 13 Global Step: 169740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:26:26,961-Speed 3289.17 samples/sec Loss 2.5694 LearningRate 0.0100 Epoch: 13 Global Step: 169750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:26:30,064-Speed 3301.14 samples/sec Loss 2.6844 LearningRate 0.0100 Epoch: 13 Global Step: 169760 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:26:33,135-Speed 3335.46 samples/sec Loss 2.6070 LearningRate 0.0100 Epoch: 13 Global Step: 169770 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:26:36,293-Speed 3243.11 samples/sec Loss 2.6482 LearningRate 0.0100 Epoch: 13 Global Step: 169780 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:26:39,384-Speed 3313.47 samples/sec Loss 2.6321 LearningRate 0.0100 Epoch: 13 Global Step: 169790 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:26:42,466-Speed 3324.34 samples/sec Loss 2.6162 LearningRate 0.0100 Epoch: 13 Global Step: 169800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:26:45,585-Speed 3284.29 samples/sec Loss 2.6387 LearningRate 0.0100 Epoch: 13 Global Step: 169810 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:26:48,724-Speed 3262.52 samples/sec Loss 2.6334 LearningRate 0.0100 Epoch: 13 Global Step: 169820 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:26:51,845-Speed 3282.77 samples/sec Loss 2.6566 LearningRate 0.0100 Epoch: 13 Global Step: 169830 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:26:54,987-Speed 3260.32 samples/sec Loss 2.6821 LearningRate 0.0100 Epoch: 13 Global Step: 169840 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:26:58,038-Speed 3356.88 samples/sec Loss 2.6923 LearningRate 0.0100 Epoch: 13 Global Step: 169850 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:01,096-Speed 3349.38 samples/sec Loss 2.6149 LearningRate 0.0100 Epoch: 13 Global Step: 169860 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:04,190-Speed 3310.61 samples/sec Loss 2.6014 LearningRate 0.0100 Epoch: 13 Global Step: 169870 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:07,367-Speed 3224.77 samples/sec Loss 2.5890 LearningRate 0.0100 Epoch: 13 Global Step: 169880 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:10,429-Speed 3344.71 samples/sec Loss 2.7073 LearningRate 0.0100 Epoch: 13 Global Step: 169890 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:13,531-Speed 3302.77 samples/sec Loss 2.5996 LearningRate 0.0100 Epoch: 13 Global Step: 169900 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:16,645-Speed 3289.79 samples/sec Loss 2.6506 LearningRate 0.0100 Epoch: 13 Global Step: 169910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:27:19,736-Speed 3313.55 samples/sec Loss 2.6336 LearningRate 0.0100 Epoch: 13 Global Step: 169920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:27:22,830-Speed 3311.00 samples/sec Loss 2.6389 LearningRate 0.0100 Epoch: 13 Global Step: 169930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:27:25,883-Speed 3355.23 samples/sec Loss 2.5981 LearningRate 0.0100 Epoch: 13 Global Step: 169940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:27:28,956-Speed 3333.22 samples/sec Loss 2.5897 LearningRate 0.0100 Epoch: 13 Global Step: 169950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:27:32,058-Speed 3302.58 samples/sec Loss 2.6097 LearningRate 0.0100 Epoch: 13 Global Step: 169960 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:35,145-Speed 3318.16 samples/sec Loss 2.5965 LearningRate 0.0100 Epoch: 13 Global Step: 169970 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:38,346-Speed 3200.23 samples/sec Loss 2.6378 LearningRate 0.0100 Epoch: 13 Global Step: 169980 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:41,474-Speed 3274.27 samples/sec Loss 2.6224 LearningRate 0.0100 Epoch: 13 Global Step: 169990 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:44,570-Speed 3308.69 samples/sec Loss 2.6149 LearningRate 0.0100 Epoch: 13 Global Step: 170000 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:47,675-Speed 3298.39 samples/sec Loss 2.6604 LearningRate 0.0100 Epoch: 13 Global Step: 170010 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:50,758-Speed 3322.41 samples/sec Loss 2.5618 LearningRate 0.0100 Epoch: 13 Global Step: 170020 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:53,870-Speed 3291.77 samples/sec Loss 2.5681 LearningRate 0.0100 Epoch: 13 Global Step: 170030 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:27:56,989-Speed 3284.08 samples/sec Loss 2.5561 LearningRate 0.0100 Epoch: 13 Global Step: 170040 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:28:00,082-Speed 3311.41 samples/sec Loss 2.6363 LearningRate 0.0100 Epoch: 13 Global Step: 170050 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:28:03,238-Speed 3246.47 samples/sec Loss 2.5960 LearningRate 0.0099 Epoch: 13 Global Step: 170060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:06,311-Speed 3332.63 samples/sec Loss 2.6495 LearningRate 0.0099 Epoch: 13 Global Step: 170070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:09,391-Speed 3326.23 samples/sec Loss 2.5658 LearningRate 0.0099 Epoch: 13 Global Step: 170080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:12,500-Speed 3294.98 samples/sec Loss 2.6518 LearningRate 0.0099 Epoch: 13 Global Step: 170090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:15,653-Speed 3247.92 samples/sec Loss 2.6544 LearningRate 0.0099 Epoch: 13 Global Step: 170100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:18,748-Speed 3310.91 samples/sec Loss 2.5972 LearningRate 0.0099 Epoch: 13 Global Step: 170110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:21,819-Speed 3335.40 samples/sec Loss 2.6644 LearningRate 0.0099 Epoch: 13 Global Step: 170120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:24,919-Speed 3303.33 samples/sec Loss 2.6238 LearningRate 0.0099 Epoch: 13 Global Step: 170130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:28,006-Speed 3319.08 samples/sec Loss 2.6035 LearningRate 0.0099 Epoch: 13 Global Step: 170140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:31,200-Speed 3206.58 samples/sec Loss 2.6035 LearningRate 0.0099 Epoch: 13 Global Step: 170150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:34,237-Speed 3372.57 samples/sec Loss 2.5789 LearningRate 0.0099 Epoch: 13 Global Step: 170160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:37,323-Speed 3319.92 samples/sec Loss 2.6295 LearningRate 0.0099 Epoch: 13 Global Step: 170170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:40,450-Speed 3275.52 samples/sec Loss 2.5777 LearningRate 0.0099 Epoch: 13 Global Step: 170180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:43,582-Speed 3269.93 samples/sec Loss 2.5800 LearningRate 0.0099 Epoch: 13 Global Step: 170190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:46,674-Speed 3312.73 samples/sec Loss 2.6617 LearningRate 0.0099 Epoch: 13 Global Step: 170200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:49,755-Speed 3325.39 samples/sec Loss 2.6134 LearningRate 0.0099 Epoch: 13 Global Step: 170210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:52,887-Speed 3270.22 samples/sec Loss 2.6583 LearningRate 0.0099 Epoch: 13 Global Step: 170220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:55,980-Speed 3311.32 samples/sec Loss 2.6504 LearningRate 0.0099 Epoch: 13 Global Step: 170230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:28:59,044-Speed 3343.10 samples/sec Loss 2.5980 LearningRate 0.0099 Epoch: 13 Global Step: 170240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:02,172-Speed 3274.37 samples/sec Loss 2.6214 LearningRate 0.0099 Epoch: 13 Global Step: 170250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:05,274-Speed 3302.31 samples/sec Loss 2.6133 LearningRate 0.0099 Epoch: 13 Global Step: 170260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:29:08,344-Speed 3337.26 samples/sec Loss 2.6036 LearningRate 0.0099 Epoch: 13 Global Step: 170270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:29:11,403-Speed 3347.77 samples/sec Loss 2.5857 LearningRate 0.0099 Epoch: 13 Global Step: 170280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:29:14,477-Speed 3332.01 samples/sec Loss 2.5640 LearningRate 0.0099 Epoch: 13 Global Step: 170290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:17,590-Speed 3291.00 samples/sec Loss 2.6688 LearningRate 0.0099 Epoch: 13 Global Step: 170300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:20,699-Speed 3294.61 samples/sec Loss 2.5421 LearningRate 0.0099 Epoch: 13 Global Step: 170310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:23,812-Speed 3290.92 samples/sec Loss 2.5698 LearningRate 0.0099 Epoch: 13 Global Step: 170320 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:26,963-Speed 3250.88 samples/sec Loss 2.6040 LearningRate 0.0099 Epoch: 13 Global Step: 170330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:30,151-Speed 3212.67 samples/sec Loss 2.6226 LearningRate 0.0099 Epoch: 13 Global Step: 170340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:33,204-Speed 3355.01 samples/sec Loss 2.5899 LearningRate 0.0099 Epoch: 13 Global Step: 170350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:36,323-Speed 3284.56 samples/sec Loss 2.6728 LearningRate 0.0099 Epoch: 13 Global Step: 170360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:39,536-Speed 3187.64 samples/sec Loss 2.6071 LearningRate 0.0099 Epoch: 13 Global Step: 170370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:42,651-Speed 3288.64 samples/sec Loss 2.6097 LearningRate 0.0099 Epoch: 13 Global Step: 170380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:45,726-Speed 3330.96 samples/sec Loss 2.6693 LearningRate 0.0099 Epoch: 13 Global Step: 170390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:29:48,815-Speed 3315.77 samples/sec Loss 2.5882 LearningRate 0.0099 Epoch: 13 Global Step: 170400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:29:51,859-Speed 3365.24 samples/sec Loss 2.5847 LearningRate 0.0099 Epoch: 13 Global Step: 170410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:54,967-Speed 3296.21 samples/sec Loss 2.6241 LearningRate 0.0099 Epoch: 13 Global Step: 170420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:29:58,040-Speed 3332.72 samples/sec Loss 2.6002 LearningRate 0.0099 Epoch: 13 Global Step: 170430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:30:01,121-Speed 3325.02 samples/sec Loss 2.5713 LearningRate 0.0099 Epoch: 13 Global Step: 170440 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:30:04,213-Speed 3313.05 samples/sec Loss 2.5702 LearningRate 0.0099 Epoch: 13 Global Step: 170450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:30:07,368-Speed 3246.22 samples/sec Loss 2.6087 LearningRate 0.0098 Epoch: 13 Global Step: 170460 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:30:10,445-Speed 3328.72 samples/sec Loss 2.6970 LearningRate 0.0098 Epoch: 13 Global Step: 170470 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:30:13,596-Speed 3251.52 samples/sec Loss 2.6327 LearningRate 0.0098 Epoch: 13 Global Step: 170480 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:30:16,760-Speed 3237.24 samples/sec Loss 2.6188 LearningRate 0.0098 Epoch: 13 Global Step: 170490 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:30:19,831-Speed 3334.80 samples/sec Loss 2.5449 LearningRate 0.0098 Epoch: 13 Global Step: 170500 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:30:22,944-Speed 3290.85 samples/sec Loss 2.6028 LearningRate 0.0098 Epoch: 13 Global Step: 170510 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:30:26,061-Speed 3286.14 samples/sec Loss 2.6103 LearningRate 0.0098 Epoch: 13 Global Step: 170520 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:30:29,179-Speed 3285.43 samples/sec Loss 2.6711 LearningRate 0.0098 Epoch: 13 Global Step: 170530 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:30:32,255-Speed 3330.34 samples/sec Loss 2.6239 LearningRate 0.0098 Epoch: 13 Global Step: 170540 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:30:35,414-Speed 3241.91 samples/sec Loss 2.5981 LearningRate 0.0098 Epoch: 13 Global Step: 170550 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:30:38,519-Speed 3299.73 samples/sec Loss 2.5810 LearningRate 0.0098 Epoch: 13 Global Step: 170560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:30:41,595-Speed 3329.90 samples/sec Loss 2.6371 LearningRate 0.0098 Epoch: 13 Global Step: 170570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:30:44,667-Speed 3333.77 samples/sec Loss 2.6915 LearningRate 0.0098 Epoch: 13 Global Step: 170580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:30:47,802-Speed 3268.15 samples/sec Loss 2.6420 LearningRate 0.0098 Epoch: 13 Global Step: 170590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:30:50,870-Speed 3338.56 samples/sec Loss 2.6507 LearningRate 0.0098 Epoch: 13 Global Step: 170600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:30:53,951-Speed 3324.64 samples/sec Loss 2.6201 LearningRate 0.0098 Epoch: 13 Global Step: 170610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:30:57,008-Speed 3350.82 samples/sec Loss 2.7399 LearningRate 0.0098 Epoch: 13 Global Step: 170620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:00,107-Speed 3305.29 samples/sec Loss 2.6260 LearningRate 0.0098 Epoch: 13 Global Step: 170630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:03,227-Speed 3282.66 samples/sec Loss 2.7050 LearningRate 0.0098 Epoch: 13 Global Step: 170640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:06,332-Speed 3298.84 samples/sec Loss 2.5424 LearningRate 0.0098 Epoch: 13 Global Step: 170650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:09,381-Speed 3360.48 samples/sec Loss 2.6231 LearningRate 0.0098 Epoch: 13 Global Step: 170660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:31:12,544-Speed 3238.23 samples/sec Loss 2.6517 LearningRate 0.0098 Epoch: 13 Global Step: 170670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:15,678-Speed 3267.67 samples/sec Loss 2.6673 LearningRate 0.0098 Epoch: 13 Global Step: 170680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:18,808-Speed 3272.66 samples/sec Loss 2.6302 LearningRate 0.0098 Epoch: 13 Global Step: 170690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:21,899-Speed 3313.79 samples/sec Loss 2.5916 LearningRate 0.0098 Epoch: 13 Global Step: 170700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:24,957-Speed 3349.87 samples/sec Loss 2.6286 LearningRate 0.0098 Epoch: 13 Global Step: 170710 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:28,081-Speed 3278.95 samples/sec Loss 2.5891 LearningRate 0.0098 Epoch: 13 Global Step: 170720 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:31,188-Speed 3296.69 samples/sec Loss 2.6328 LearningRate 0.0098 Epoch: 13 Global Step: 170730 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:34,251-Speed 3344.25 samples/sec Loss 2.6050 LearningRate 0.0098 Epoch: 13 Global Step: 170740 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:37,371-Speed 3284.02 samples/sec Loss 2.6761 LearningRate 0.0098 Epoch: 13 Global Step: 170750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:40,521-Speed 3251.48 samples/sec Loss 2.6049 LearningRate 0.0098 Epoch: 13 Global Step: 170760 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:43,710-Speed 3212.14 samples/sec Loss 2.6601 LearningRate 0.0098 Epoch: 13 Global Step: 170770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:31:46,782-Speed 3334.39 samples/sec Loss 2.5760 LearningRate 0.0098 Epoch: 13 Global Step: 170780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:31:49,868-Speed 3319.41 samples/sec Loss 2.6021 LearningRate 0.0098 Epoch: 13 Global Step: 170790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:31:52,972-Speed 3299.86 samples/sec Loss 2.6281 LearningRate 0.0098 Epoch: 13 Global Step: 170800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:31:56,028-Speed 3351.60 samples/sec Loss 2.6340 LearningRate 0.0098 Epoch: 13 Global Step: 170810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:31:59,148-Speed 3283.15 samples/sec Loss 2.6630 LearningRate 0.0098 Epoch: 13 Global Step: 170820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:32:02,309-Speed 3241.05 samples/sec Loss 2.6468 LearningRate 0.0098 Epoch: 13 Global Step: 170830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:32:05,412-Speed 3301.65 samples/sec Loss 2.6219 LearningRate 0.0098 Epoch: 13 Global Step: 170840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:32:08,521-Speed 3294.29 samples/sec Loss 2.6408 LearningRate 0.0098 Epoch: 13 Global Step: 170850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:32:11,612-Speed 3313.50 samples/sec Loss 2.6799 LearningRate 0.0097 Epoch: 13 Global Step: 170860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:32:14,712-Speed 3304.10 samples/sec Loss 2.5506 LearningRate 0.0097 Epoch: 13 Global Step: 170870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:32:17,813-Speed 3303.27 samples/sec Loss 2.6754 LearningRate 0.0097 Epoch: 13 Global Step: 170880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:32:20,907-Speed 3310.56 samples/sec Loss 2.6338 LearningRate 0.0097 Epoch: 13 Global Step: 170890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:32:23,999-Speed 3312.94 samples/sec Loss 2.5983 LearningRate 0.0097 Epoch: 13 Global Step: 170900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:32:27,054-Speed 3353.00 samples/sec Loss 2.5892 LearningRate 0.0097 Epoch: 13 Global Step: 170910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:32:30,148-Speed 3310.67 samples/sec Loss 2.6506 LearningRate 0.0097 Epoch: 13 Global Step: 170920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:32:33,251-Speed 3300.93 samples/sec Loss 2.6608 LearningRate 0.0097 Epoch: 13 Global Step: 170930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:32:36,374-Speed 3280.23 samples/sec Loss 2.6468 LearningRate 0.0097 Epoch: 13 Global Step: 170940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:32:39,535-Speed 3239.97 samples/sec Loss 2.5855 LearningRate 0.0097 Epoch: 13 Global Step: 170950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:32:42,621-Speed 3319.04 samples/sec Loss 2.5961 LearningRate 0.0097 Epoch: 13 Global Step: 170960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:32:45,695-Speed 3333.08 samples/sec Loss 2.6311 LearningRate 0.0097 Epoch: 13 Global Step: 170970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:32:48,863-Speed 3233.22 samples/sec Loss 2.7172 LearningRate 0.0097 Epoch: 13 Global Step: 170980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:32:51,969-Speed 3297.50 samples/sec Loss 2.6428 LearningRate 0.0097 Epoch: 13 Global Step: 170990 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:32:55,129-Speed 3241.42 samples/sec Loss 2.5483 LearningRate 0.0097 Epoch: 13 Global Step: 171000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:32:58,206-Speed 3328.87 samples/sec Loss 2.6121 LearningRate 0.0097 Epoch: 13 Global Step: 171010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:01,288-Speed 3323.74 samples/sec Loss 2.7292 LearningRate 0.0097 Epoch: 13 Global Step: 171020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:04,438-Speed 3251.44 samples/sec Loss 2.6441 LearningRate 0.0097 Epoch: 13 Global Step: 171030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:07,558-Speed 3283.65 samples/sec Loss 2.6413 LearningRate 0.0097 Epoch: 13 Global Step: 171040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:10,621-Speed 3343.73 samples/sec Loss 2.6426 LearningRate 0.0097 Epoch: 13 Global Step: 171050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:13,815-Speed 3206.54 samples/sec Loss 2.6873 LearningRate 0.0097 Epoch: 13 Global Step: 171060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:16,944-Speed 3274.55 samples/sec Loss 2.5664 LearningRate 0.0097 Epoch: 13 Global Step: 171070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:20,070-Speed 3276.46 samples/sec Loss 2.5299 LearningRate 0.0097 Epoch: 13 Global Step: 171080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:23,130-Speed 3347.51 samples/sec Loss 2.6531 LearningRate 0.0097 Epoch: 13 Global Step: 171090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:26,214-Speed 3320.72 samples/sec Loss 2.6759 LearningRate 0.0097 Epoch: 13 Global Step: 171100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:29,288-Speed 3332.78 samples/sec Loss 2.5960 LearningRate 0.0097 Epoch: 13 Global Step: 171110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:32,387-Speed 3304.75 samples/sec Loss 2.5783 LearningRate 0.0097 Epoch: 13 Global Step: 171120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:35,474-Speed 3318.88 samples/sec Loss 2.6686 LearningRate 0.0097 Epoch: 13 Global Step: 171130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:38,636-Speed 3239.34 samples/sec Loss 2.6508 LearningRate 0.0097 Epoch: 13 Global Step: 171140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:41,772-Speed 3265.81 samples/sec Loss 2.6689 LearningRate 0.0097 Epoch: 13 Global Step: 171150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:44,873-Speed 3303.31 samples/sec Loss 2.6769 LearningRate 0.0097 Epoch: 13 Global Step: 171160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:48,022-Speed 3253.81 samples/sec Loss 2.6670 LearningRate 0.0097 Epoch: 13 Global Step: 171170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:51,100-Speed 3327.48 samples/sec Loss 2.6474 LearningRate 0.0097 Epoch: 13 Global Step: 171180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:33:54,297-Speed 3203.37 samples/sec Loss 2.7149 LearningRate 0.0097 Epoch: 13 Global Step: 171190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:33:57,378-Speed 3325.12 samples/sec Loss 2.6780 LearningRate 0.0097 Epoch: 13 Global Step: 171200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:34:00,463-Speed 3320.35 samples/sec Loss 2.6366 LearningRate 0.0097 Epoch: 13 Global Step: 171210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:03,664-Speed 3200.83 samples/sec Loss 2.6665 LearningRate 0.0097 Epoch: 13 Global Step: 171220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:06,759-Speed 3309.80 samples/sec Loss 2.6233 LearningRate 0.0097 Epoch: 13 Global Step: 171230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:09,858-Speed 3304.30 samples/sec Loss 2.6561 LearningRate 0.0097 Epoch: 13 Global Step: 171240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:12,962-Speed 3300.38 samples/sec Loss 2.5795 LearningRate 0.0096 Epoch: 13 Global Step: 171250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:16,163-Speed 3200.88 samples/sec Loss 2.5776 LearningRate 0.0096 Epoch: 13 Global Step: 171260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:19,289-Speed 3276.62 samples/sec Loss 2.5753 LearningRate 0.0096 Epoch: 13 Global Step: 171270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:22,391-Speed 3301.65 samples/sec Loss 2.6744 LearningRate 0.0096 Epoch: 13 Global Step: 171280 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:25,485-Speed 3311.07 samples/sec Loss 2.7278 LearningRate 0.0096 Epoch: 13 Global Step: 171290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:28,572-Speed 3318.48 samples/sec Loss 2.6499 LearningRate 0.0096 Epoch: 13 Global Step: 171300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:31,657-Speed 3320.45 samples/sec Loss 2.6418 LearningRate 0.0096 Epoch: 13 Global Step: 171310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:34:34,772-Speed 3289.12 samples/sec Loss 2.6240 LearningRate 0.0096 Epoch: 13 Global Step: 171320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:34:37,874-Speed 3302.12 samples/sec Loss 2.6090 LearningRate 0.0096 Epoch: 13 Global Step: 171330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:34:40,938-Speed 3342.58 samples/sec Loss 2.6064 LearningRate 0.0096 Epoch: 13 Global Step: 171340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:44,089-Speed 3250.90 samples/sec Loss 2.6021 LearningRate 0.0096 Epoch: 13 Global Step: 171350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:47,206-Speed 3286.24 samples/sec Loss 2.6298 LearningRate 0.0096 Epoch: 13 Global Step: 171360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:50,271-Speed 3342.31 samples/sec Loss 2.6803 LearningRate 0.0096 Epoch: 13 Global Step: 171370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:53,409-Speed 3263.60 samples/sec Loss 2.6594 LearningRate 0.0096 Epoch: 13 Global Step: 171380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:56,515-Speed 3298.80 samples/sec Loss 2.6933 LearningRate 0.0096 Epoch: 13 Global Step: 171390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:34:59,661-Speed 3256.49 samples/sec Loss 2.6181 LearningRate 0.0096 Epoch: 13 Global Step: 171400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:35:02,862-Speed 3199.16 samples/sec Loss 2.6597 LearningRate 0.0096 Epoch: 13 Global Step: 171410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:35:05,979-Speed 3286.62 samples/sec Loss 2.6578 LearningRate 0.0096 Epoch: 13 Global Step: 171420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:35:09,063-Speed 3321.61 samples/sec Loss 2.6087 LearningRate 0.0096 Epoch: 13 Global Step: 171430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:35:12,240-Speed 3224.69 samples/sec Loss 2.5822 LearningRate 0.0096 Epoch: 13 Global Step: 171440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:35:15,336-Speed 3308.10 samples/sec Loss 2.5801 LearningRate 0.0096 Epoch: 13 Global Step: 171450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:35:18,431-Speed 3309.74 samples/sec Loss 2.7104 LearningRate 0.0096 Epoch: 13 Global Step: 171460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:35:21,492-Speed 3345.97 samples/sec Loss 2.6797 LearningRate 0.0096 Epoch: 13 Global Step: 171470 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:35:24,591-Speed 3306.11 samples/sec Loss 2.5962 LearningRate 0.0096 Epoch: 13 Global Step: 171480 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:35:27,756-Speed 3235.46 samples/sec Loss 2.6502 LearningRate 0.0096 Epoch: 13 Global Step: 171490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:35:30,835-Speed 3327.18 samples/sec Loss 2.6990 LearningRate 0.0096 Epoch: 13 Global Step: 171500 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:35:33,918-Speed 3322.55 samples/sec Loss 2.5740 LearningRate 0.0096 Epoch: 13 Global Step: 171510 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:35:37,011-Speed 3311.62 samples/sec Loss 2.6409 LearningRate 0.0096 Epoch: 13 Global Step: 171520 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:35:40,132-Speed 3282.79 samples/sec Loss 2.6707 LearningRate 0.0096 Epoch: 13 Global Step: 171530 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:35:43,286-Speed 3247.09 samples/sec Loss 2.5860 LearningRate 0.0096 Epoch: 13 Global Step: 171540 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:35:46,343-Speed 3350.70 samples/sec Loss 2.6007 LearningRate 0.0096 Epoch: 13 Global Step: 171550 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:35:49,446-Speed 3301.71 samples/sec Loss 2.7038 LearningRate 0.0096 Epoch: 13 Global Step: 171560 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:35:52,522-Speed 3330.05 samples/sec Loss 2.5774 LearningRate 0.0096 Epoch: 13 Global Step: 171570 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:35:55,623-Speed 3302.75 samples/sec Loss 2.6109 LearningRate 0.0096 Epoch: 13 Global Step: 171580 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:35:58,704-Speed 3324.38 samples/sec Loss 2.6910 LearningRate 0.0096 Epoch: 13 Global Step: 171590 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:36:01,897-Speed 3207.91 samples/sec Loss 2.6750 LearningRate 0.0096 Epoch: 13 Global Step: 171600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:04,960-Speed 3344.68 samples/sec Loss 2.6605 LearningRate 0.0096 Epoch: 13 Global Step: 171610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:08,029-Speed 3338.03 samples/sec Loss 2.6757 LearningRate 0.0096 Epoch: 13 Global Step: 171620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:11,162-Speed 3269.04 samples/sec Loss 2.6131 LearningRate 0.0096 Epoch: 13 Global Step: 171630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:14,274-Speed 3291.54 samples/sec Loss 2.6636 LearningRate 0.0096 Epoch: 13 Global Step: 171640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:17,332-Speed 3349.86 samples/sec Loss 2.6383 LearningRate 0.0096 Epoch: 13 Global Step: 171650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:20,406-Speed 3332.44 samples/sec Loss 2.5580 LearningRate 0.0095 Epoch: 13 Global Step: 171660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:23,501-Speed 3309.83 samples/sec Loss 2.6325 LearningRate 0.0095 Epoch: 13 Global Step: 171670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:26,647-Speed 3255.71 samples/sec Loss 2.6021 LearningRate 0.0095 Epoch: 13 Global Step: 171680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:29,746-Speed 3306.03 samples/sec Loss 2.5388 LearningRate 0.0095 Epoch: 13 Global Step: 171690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:32,868-Speed 3281.14 samples/sec Loss 2.5965 LearningRate 0.0095 Epoch: 13 Global Step: 171700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:36:35,967-Speed 3305.21 samples/sec Loss 2.6139 LearningRate 0.0095 Epoch: 13 Global Step: 171710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:36:39,051-Speed 3322.00 samples/sec Loss 2.5732 LearningRate 0.0095 Epoch: 13 Global Step: 171720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:36:42,132-Speed 3323.98 samples/sec Loss 2.6209 LearningRate 0.0095 Epoch: 13 Global Step: 171730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:36:45,207-Speed 3331.81 samples/sec Loss 2.5991 LearningRate 0.0095 Epoch: 13 Global Step: 171740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:36:48,241-Speed 3375.51 samples/sec Loss 2.5499 LearningRate 0.0095 Epoch: 13 Global Step: 171750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:51,345-Speed 3300.18 samples/sec Loss 2.6361 LearningRate 0.0095 Epoch: 13 Global Step: 171760 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:54,443-Speed 3306.49 samples/sec Loss 2.7104 LearningRate 0.0095 Epoch: 13 Global Step: 171770 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:36:57,520-Speed 3328.93 samples/sec Loss 2.6023 LearningRate 0.0095 Epoch: 13 Global Step: 171780 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:37:00,602-Speed 3323.65 samples/sec Loss 2.5772 LearningRate 0.0095 Epoch: 13 Global Step: 171790 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:37:03,712-Speed 3293.89 samples/sec Loss 2.6446 LearningRate 0.0095 Epoch: 13 Global Step: 171800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:37:06,812-Speed 3304.97 samples/sec Loss 2.6470 LearningRate 0.0095 Epoch: 13 Global Step: 171810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:37:09,860-Speed 3360.83 samples/sec Loss 2.6519 LearningRate 0.0095 Epoch: 13 Global Step: 171820 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:37:12,950-Speed 3314.14 samples/sec Loss 2.5984 LearningRate 0.0095 Epoch: 13 Global Step: 171830 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:37:16,069-Speed 3284.56 samples/sec Loss 2.7208 LearningRate 0.0095 Epoch: 13 Global Step: 171840 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:37:19,191-Speed 3281.39 samples/sec Loss 2.6176 LearningRate 0.0095 Epoch: 13 Global Step: 171850 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:37:22,288-Speed 3307.01 samples/sec Loss 2.6856 LearningRate 0.0095 Epoch: 13 Global Step: 171860 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:37:25,465-Speed 3224.05 samples/sec Loss 2.6357 LearningRate 0.0095 Epoch: 13 Global Step: 171870 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:37:28,584-Speed 3284.48 samples/sec Loss 2.6031 LearningRate 0.0095 Epoch: 13 Global Step: 171880 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:37:31,662-Speed 3327.75 samples/sec Loss 2.6199 LearningRate 0.0095 Epoch: 13 Global Step: 171890 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:37:34,725-Speed 3344.64 samples/sec Loss 2.6263 LearningRate 0.0095 Epoch: 13 Global Step: 171900 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:37:37,902-Speed 3223.64 samples/sec Loss 2.6767 LearningRate 0.0095 Epoch: 13 Global Step: 171910 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:37:40,992-Speed 3315.24 samples/sec Loss 2.6992 LearningRate 0.0095 Epoch: 13 Global Step: 171920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:37:44,056-Speed 3343.31 samples/sec Loss 2.6610 LearningRate 0.0095 Epoch: 13 Global Step: 171930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:37:47,134-Speed 3327.79 samples/sec Loss 2.5944 LearningRate 0.0095 Epoch: 13 Global Step: 171940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:37:50,264-Speed 3272.68 samples/sec Loss 2.6151 LearningRate 0.0095 Epoch: 13 Global Step: 171950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:37:53,326-Speed 3346.25 samples/sec Loss 2.6058 LearningRate 0.0095 Epoch: 13 Global Step: 171960 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:37:56,368-Speed 3367.02 samples/sec Loss 2.6382 LearningRate 0.0095 Epoch: 13 Global Step: 171970 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:37:59,486-Speed 3285.02 samples/sec Loss 2.6687 LearningRate 0.0095 Epoch: 13 Global Step: 171980 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:38:02,567-Speed 3324.41 samples/sec Loss 2.7209 LearningRate 0.0095 Epoch: 13 Global Step: 171990 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:38:05,635-Speed 3339.40 samples/sec Loss 2.6572 LearningRate 0.0095 Epoch: 13 Global Step: 172000 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:38:08,700-Speed 3341.46 samples/sec Loss 2.5914 LearningRate 0.0095 Epoch: 13 Global Step: 172010 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:38:11,804-Speed 3300.58 samples/sec Loss 2.6224 LearningRate 0.0095 Epoch: 13 Global Step: 172020 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:38:14,928-Speed 3279.36 samples/sec Loss 2.6691 LearningRate 0.0095 Epoch: 13 Global Step: 172030 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:38:18,050-Speed 3280.65 samples/sec Loss 2.6460 LearningRate 0.0095 Epoch: 13 Global Step: 172040 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:38:21,137-Speed 3318.36 samples/sec Loss 2.7280 LearningRate 0.0095 Epoch: 13 Global Step: 172050 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:38:24,236-Speed 3305.01 samples/sec Loss 2.6422 LearningRate 0.0094 Epoch: 13 Global Step: 172060 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:38:27,427-Speed 3210.46 samples/sec Loss 2.5895 LearningRate 0.0094 Epoch: 13 Global Step: 172070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:38:30,529-Speed 3301.46 samples/sec Loss 2.6301 LearningRate 0.0094 Epoch: 13 Global Step: 172080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:38:33,594-Speed 3342.74 samples/sec Loss 2.6652 LearningRate 0.0094 Epoch: 13 Global Step: 172090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:38:36,671-Speed 3328.76 samples/sec Loss 2.6287 LearningRate 0.0094 Epoch: 13 Global Step: 172100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:38:39,750-Speed 3326.37 samples/sec Loss 2.6483 LearningRate 0.0094 Epoch: 13 Global Step: 172110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:38:42,848-Speed 3307.27 samples/sec Loss 2.6521 LearningRate 0.0094 Epoch: 13 Global Step: 172120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:38:45,918-Speed 3336.92 samples/sec Loss 2.6567 LearningRate 0.0094 Epoch: 13 Global Step: 172130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:38:48,999-Speed 3324.35 samples/sec Loss 2.5928 LearningRate 0.0094 Epoch: 13 Global Step: 172140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:38:52,099-Speed 3303.93 samples/sec Loss 2.6485 LearningRate 0.0094 Epoch: 13 Global Step: 172150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:38:55,260-Speed 3241.04 samples/sec Loss 2.6676 LearningRate 0.0094 Epoch: 13 Global Step: 172160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:38:58,328-Speed 3337.89 samples/sec Loss 2.5868 LearningRate 0.0094 Epoch: 13 Global Step: 172170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:39:01,463-Speed 3267.15 samples/sec Loss 2.6243 LearningRate 0.0094 Epoch: 13 Global Step: 172180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:39:04,536-Speed 3333.58 samples/sec Loss 2.6830 LearningRate 0.0094 Epoch: 13 Global Step: 172190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:39:07,587-Speed 3357.92 samples/sec Loss 2.6443 LearningRate 0.0094 Epoch: 13 Global Step: 172200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:39:10,643-Speed 3351.83 samples/sec Loss 2.6041 LearningRate 0.0094 Epoch: 13 Global Step: 172210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:39:13,732-Speed 3315.57 samples/sec Loss 2.6607 LearningRate 0.0094 Epoch: 13 Global Step: 172220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:39:16,836-Speed 3300.34 samples/sec Loss 2.6299 LearningRate 0.0094 Epoch: 13 Global Step: 172230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:39:19,901-Speed 3341.57 samples/sec Loss 2.5874 LearningRate 0.0094 Epoch: 13 Global Step: 172240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:39:22,989-Speed 3318.24 samples/sec Loss 2.6732 LearningRate 0.0094 Epoch: 13 Global Step: 172250 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:39:26,088-Speed 3305.14 samples/sec Loss 2.6902 LearningRate 0.0094 Epoch: 13 Global Step: 172260 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:39:29,177-Speed 3315.33 samples/sec Loss 2.6787 LearningRate 0.0094 Epoch: 13 Global Step: 172270 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:39:32,274-Speed 3307.99 samples/sec Loss 2.6828 LearningRate 0.0094 Epoch: 13 Global Step: 172280 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:39:35,414-Speed 3261.67 samples/sec Loss 2.6020 LearningRate 0.0094 Epoch: 13 Global Step: 172290 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:39:38,603-Speed 3212.38 samples/sec Loss 2.6629 LearningRate 0.0094 Epoch: 13 Global Step: 172300 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:39:41,698-Speed 3310.10 samples/sec Loss 2.6406 LearningRate 0.0094 Epoch: 13 Global Step: 172310 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:39:44,771-Speed 3333.59 samples/sec Loss 2.5721 LearningRate 0.0094 Epoch: 13 Global Step: 172320 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:39:47,890-Speed 3283.24 samples/sec Loss 2.6045 LearningRate 0.0094 Epoch: 13 Global Step: 172330 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:39:51,065-Speed 3227.06 samples/sec Loss 2.6769 LearningRate 0.0094 Epoch: 13 Global Step: 172340 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:39:54,196-Speed 3271.07 samples/sec Loss 2.6222 LearningRate 0.0094 Epoch: 13 Global Step: 172350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:39:57,264-Speed 3338.98 samples/sec Loss 2.6201 LearningRate 0.0094 Epoch: 13 Global Step: 172360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:00,380-Speed 3287.15 samples/sec Loss 2.6435 LearningRate 0.0094 Epoch: 13 Global Step: 172370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:03,590-Speed 3191.03 samples/sec Loss 2.6170 LearningRate 0.0094 Epoch: 13 Global Step: 172380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:06,700-Speed 3294.05 samples/sec Loss 2.6647 LearningRate 0.0094 Epoch: 13 Global Step: 172390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:09,764-Speed 3342.38 samples/sec Loss 2.6579 LearningRate 0.0094 Epoch: 13 Global Step: 172400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:12,857-Speed 3312.58 samples/sec Loss 2.6554 LearningRate 0.0094 Epoch: 13 Global Step: 172410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:15,924-Speed 3340.30 samples/sec Loss 2.6939 LearningRate 0.0094 Epoch: 13 Global Step: 172420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:18,995-Speed 3334.99 samples/sec Loss 2.6777 LearningRate 0.0094 Epoch: 13 Global Step: 172430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:22,095-Speed 3304.63 samples/sec Loss 2.5931 LearningRate 0.0094 Epoch: 13 Global Step: 172440 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:25,205-Speed 3293.52 samples/sec Loss 2.6440 LearningRate 0.0094 Epoch: 13 Global Step: 172450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:40:28,317-Speed 3291.74 samples/sec Loss 2.6098 LearningRate 0.0093 Epoch: 13 Global Step: 172460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:40:31,372-Speed 3353.20 samples/sec Loss 2.6152 LearningRate 0.0093 Epoch: 13 Global Step: 172470 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:34,449-Speed 3328.97 samples/sec Loss 2.7240 LearningRate 0.0093 Epoch: 13 Global Step: 172480 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:37,581-Speed 3270.50 samples/sec Loss 2.6676 LearningRate 0.0093 Epoch: 13 Global Step: 172490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:40,657-Speed 3329.40 samples/sec Loss 2.6090 LearningRate 0.0093 Epoch: 13 Global Step: 172500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:43,712-Speed 3352.89 samples/sec Loss 2.6296 LearningRate 0.0093 Epoch: 13 Global Step: 172510 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:46,766-Speed 3354.04 samples/sec Loss 2.5942 LearningRate 0.0093 Epoch: 13 Global Step: 172520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:49,880-Speed 3289.66 samples/sec Loss 2.6570 LearningRate 0.0093 Epoch: 13 Global Step: 172530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:52,992-Speed 3291.41 samples/sec Loss 2.6013 LearningRate 0.0093 Epoch: 13 Global Step: 172540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:56,122-Speed 3272.73 samples/sec Loss 2.5487 LearningRate 0.0093 Epoch: 13 Global Step: 172550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:40:59,210-Speed 3316.87 samples/sec Loss 2.5979 LearningRate 0.0093 Epoch: 13 Global Step: 172560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:41:02,348-Speed 3265.20 samples/sec Loss 2.6013 LearningRate 0.0093 Epoch: 13 Global Step: 172570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:41:05,482-Speed 3268.64 samples/sec Loss 2.6699 LearningRate 0.0093 Epoch: 13 Global Step: 172580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:41:08,589-Speed 3296.12 samples/sec Loss 2.6037 LearningRate 0.0093 Epoch: 13 Global Step: 172590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:41:11,736-Speed 3255.01 samples/sec Loss 2.5830 LearningRate 0.0093 Epoch: 13 Global Step: 172600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:41:14,874-Speed 3264.76 samples/sec Loss 2.6003 LearningRate 0.0093 Epoch: 13 Global Step: 172610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:41:17,949-Speed 3330.60 samples/sec Loss 2.5732 LearningRate 0.0093 Epoch: 13 Global Step: 172620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:41:21,029-Speed 3325.84 samples/sec Loss 2.6379 LearningRate 0.0093 Epoch: 13 Global Step: 172630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:41:24,076-Speed 3361.52 samples/sec Loss 2.6737 LearningRate 0.0093 Epoch: 13 Global Step: 172640 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:41:27,204-Speed 3274.76 samples/sec Loss 2.6011 LearningRate 0.0093 Epoch: 13 Global Step: 172650 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:41:30,326-Speed 3280.70 samples/sec Loss 2.6455 LearningRate 0.0093 Epoch: 13 Global Step: 172660 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:41:33,403-Speed 3328.92 samples/sec Loss 2.6548 LearningRate 0.0093 Epoch: 13 Global Step: 172670 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:41:36,491-Speed 3317.57 samples/sec Loss 2.6513 LearningRate 0.0093 Epoch: 13 Global Step: 172680 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:41:39,614-Speed 3279.84 samples/sec Loss 2.5935 LearningRate 0.0093 Epoch: 13 Global Step: 172690 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:41:42,739-Speed 3277.71 samples/sec Loss 2.6018 LearningRate 0.0093 Epoch: 13 Global Step: 172700 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-27 16:41:45,828-Speed 3316.73 samples/sec Loss 2.6322 LearningRate 0.0093 Epoch: 13 Global Step: 172710 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-27 16:41:48,914-Speed 3318.89 samples/sec Loss 2.5752 LearningRate 0.0093 Epoch: 13 Global Step: 172720 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-27 16:41:52,006-Speed 3313.03 samples/sec Loss 2.6024 LearningRate 0.0093 Epoch: 13 Global Step: 172730 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-27 16:41:55,097-Speed 3314.56 samples/sec Loss 2.5885 LearningRate 0.0093 Epoch: 13 Global Step: 172740 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-27 16:41:58,191-Speed 3310.18 samples/sec Loss 2.6802 LearningRate 0.0093 Epoch: 13 Global Step: 172750 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-27 16:42:01,322-Speed 3272.15 samples/sec Loss 2.6143 LearningRate 0.0093 Epoch: 13 Global Step: 172760 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-27 16:42:04,536-Speed 3186.76 samples/sec Loss 2.7341 LearningRate 0.0093 Epoch: 13 Global Step: 172770 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-27 16:42:07,676-Speed 3261.84 samples/sec Loss 2.6711 LearningRate 0.0093 Epoch: 13 Global Step: 172780 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-27 16:42:10,745-Speed 3337.60 samples/sec Loss 2.6013 LearningRate 0.0093 Epoch: 13 Global Step: 172790 Fp16 Grad Scale: 4096 Required: 7 hours Training: 2022-04-27 16:42:13,865-Speed 3283.67 samples/sec Loss 2.6374 LearningRate 0.0093 Epoch: 13 Global Step: 172800 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:42:16,957-Speed 3312.05 samples/sec Loss 2.5675 LearningRate 0.0093 Epoch: 13 Global Step: 172810 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:42:20,040-Speed 3322.66 samples/sec Loss 2.6166 LearningRate 0.0093 Epoch: 13 Global Step: 172820 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:42:23,131-Speed 3314.02 samples/sec Loss 2.6011 LearningRate 0.0093 Epoch: 13 Global Step: 172830 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:42:26,226-Speed 3309.09 samples/sec Loss 2.6035 LearningRate 0.0093 Epoch: 13 Global Step: 172840 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:42:29,325-Speed 3305.50 samples/sec Loss 2.6967 LearningRate 0.0093 Epoch: 13 Global Step: 172850 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:42:32,399-Speed 3332.86 samples/sec Loss 2.6099 LearningRate 0.0093 Epoch: 13 Global Step: 172860 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:42:35,486-Speed 3317.82 samples/sec Loss 2.6130 LearningRate 0.0092 Epoch: 13 Global Step: 172870 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:42:38,634-Speed 3254.26 samples/sec Loss 2.6285 LearningRate 0.0092 Epoch: 13 Global Step: 172880 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:42:41,697-Speed 3344.20 samples/sec Loss 2.6340 LearningRate 0.0092 Epoch: 13 Global Step: 172890 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 16:42:44,768-Speed 3335.68 samples/sec Loss 2.7141 LearningRate 0.0092 Epoch: 13 Global Step: 172900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:42:47,833-Speed 3342.11 samples/sec Loss 2.5899 LearningRate 0.0092 Epoch: 13 Global Step: 172910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:42:50,911-Speed 3327.81 samples/sec Loss 2.5722 LearningRate 0.0092 Epoch: 13 Global Step: 172920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:42:54,021-Speed 3294.05 samples/sec Loss 2.5554 LearningRate 0.0092 Epoch: 13 Global Step: 172930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:42:57,092-Speed 3335.74 samples/sec Loss 2.6930 LearningRate 0.0092 Epoch: 13 Global Step: 172940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:43:00,231-Speed 3263.31 samples/sec Loss 2.6295 LearningRate 0.0092 Epoch: 13 Global Step: 172950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:43:03,328-Speed 3307.16 samples/sec Loss 2.5646 LearningRate 0.0092 Epoch: 13 Global Step: 172960 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:43:06,405-Speed 3329.43 samples/sec Loss 2.6505 LearningRate 0.0092 Epoch: 13 Global Step: 172970 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:43:09,483-Speed 3327.68 samples/sec Loss 2.6019 LearningRate 0.0092 Epoch: 13 Global Step: 172980 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:43:12,575-Speed 3312.88 samples/sec Loss 2.5763 LearningRate 0.0092 Epoch: 13 Global Step: 172990 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:43:15,712-Speed 3265.67 samples/sec Loss 2.6889 LearningRate 0.0092 Epoch: 13 Global Step: 173000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:43:18,787-Speed 3330.01 samples/sec Loss 2.6512 LearningRate 0.0092 Epoch: 13 Global Step: 173010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:43:21,852-Speed 3342.13 samples/sec Loss 2.6326 LearningRate 0.0092 Epoch: 13 Global Step: 173020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:43:25,551-Speed 2769.40 samples/sec Loss 2.6997 LearningRate 0.0092 Epoch: 13 Global Step: 173030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:43:28,615-Speed 3342.65 samples/sec Loss 2.6239 LearningRate 0.0092 Epoch: 13 Global Step: 173040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:43:31,675-Speed 3347.44 samples/sec Loss 2.6946 LearningRate 0.0092 Epoch: 13 Global Step: 173050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:43:34,738-Speed 3343.80 samples/sec Loss 2.6514 LearningRate 0.0092 Epoch: 13 Global Step: 173060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:43:37,837-Speed 3305.52 samples/sec Loss 2.6105 LearningRate 0.0092 Epoch: 13 Global Step: 173070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:43:40,931-Speed 3311.02 samples/sec Loss 2.6339 LearningRate 0.0092 Epoch: 13 Global Step: 173080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:43:44,060-Speed 3273.45 samples/sec Loss 2.5804 LearningRate 0.0092 Epoch: 13 Global Step: 173090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 16:43:47,118-Speed 3349.48 samples/sec Loss 2.6941 LearningRate 0.0092 Epoch: 13 Global Step: 173100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:43:50,305-Speed 3214.57 samples/sec Loss 2.6192 LearningRate 0.0092 Epoch: 13 Global Step: 173110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:43:53,490-Speed 3216.24 samples/sec Loss 2.6448 LearningRate 0.0092 Epoch: 13 Global Step: 173120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:43:56,550-Speed 3347.50 samples/sec Loss 2.6511 LearningRate 0.0092 Epoch: 13 Global Step: 173130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:43:59,656-Speed 3297.63 samples/sec Loss 2.6193 LearningRate 0.0092 Epoch: 13 Global Step: 173140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 16:44:02,842-Speed 3215.50 samples/sec Loss 2.5926 LearningRate 0.0092 Epoch: 13 Global Step: 173150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:05,991-Speed 3253.59 samples/sec Loss 2.6305 LearningRate 0.0092 Epoch: 13 Global Step: 173160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:09,067-Speed 3329.72 samples/sec Loss 2.6793 LearningRate 0.0092 Epoch: 13 Global Step: 173170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:12,140-Speed 3333.18 samples/sec Loss 2.5897 LearningRate 0.0092 Epoch: 13 Global Step: 173180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:15,225-Speed 3320.66 samples/sec Loss 2.5729 LearningRate 0.0092 Epoch: 13 Global Step: 173190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:18,349-Speed 3278.55 samples/sec Loss 2.6613 LearningRate 0.0092 Epoch: 13 Global Step: 173200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:44:21,419-Speed 3337.04 samples/sec Loss 2.6248 LearningRate 0.0092 Epoch: 13 Global Step: 173210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:44:24,493-Speed 3332.46 samples/sec Loss 2.6279 LearningRate 0.0092 Epoch: 13 Global Step: 173220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:27,612-Speed 3283.92 samples/sec Loss 2.5597 LearningRate 0.0092 Epoch: 13 Global Step: 173230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:30,705-Speed 3312.32 samples/sec Loss 2.6326 LearningRate 0.0092 Epoch: 13 Global Step: 173240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:33,810-Speed 3298.80 samples/sec Loss 2.6193 LearningRate 0.0092 Epoch: 13 Global Step: 173250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:36,968-Speed 3243.16 samples/sec Loss 2.6272 LearningRate 0.0092 Epoch: 13 Global Step: 173260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:40,150-Speed 3219.01 samples/sec Loss 2.6704 LearningRate 0.0092 Epoch: 13 Global Step: 173270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:43,289-Speed 3263.87 samples/sec Loss 2.6950 LearningRate 0.0091 Epoch: 13 Global Step: 173280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:46,376-Speed 3317.65 samples/sec Loss 2.7111 LearningRate 0.0091 Epoch: 13 Global Step: 173290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:49,471-Speed 3310.26 samples/sec Loss 2.6210 LearningRate 0.0091 Epoch: 13 Global Step: 173300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:52,548-Speed 3328.64 samples/sec Loss 2.6262 LearningRate 0.0091 Epoch: 13 Global Step: 173310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:44:55,601-Speed 3355.06 samples/sec Loss 2.6274 LearningRate 0.0091 Epoch: 13 Global Step: 173320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:44:58,694-Speed 3311.27 samples/sec Loss 2.5999 LearningRate 0.0091 Epoch: 13 Global Step: 173330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:45:01,792-Speed 3306.38 samples/sec Loss 2.6226 LearningRate 0.0091 Epoch: 13 Global Step: 173340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:45:04,908-Speed 3287.52 samples/sec Loss 2.6798 LearningRate 0.0091 Epoch: 13 Global Step: 173350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:45:08,082-Speed 3226.88 samples/sec Loss 2.6045 LearningRate 0.0091 Epoch: 13 Global Step: 173360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:45:11,216-Speed 3268.80 samples/sec Loss 2.6849 LearningRate 0.0091 Epoch: 13 Global Step: 173370 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:45:14,329-Speed 3290.19 samples/sec Loss 2.6791 LearningRate 0.0091 Epoch: 13 Global Step: 173380 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:45:17,386-Speed 3351.66 samples/sec Loss 2.6972 LearningRate 0.0091 Epoch: 13 Global Step: 173390 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:45:20,506-Speed 3283.06 samples/sec Loss 2.5980 LearningRate 0.0091 Epoch: 13 Global Step: 173400 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:45:23,629-Speed 3279.52 samples/sec Loss 2.6222 LearningRate 0.0091 Epoch: 13 Global Step: 173410 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:45:26,760-Speed 3271.93 samples/sec Loss 2.6519 LearningRate 0.0091 Epoch: 13 Global Step: 173420 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:45:29,846-Speed 3319.19 samples/sec Loss 2.6061 LearningRate 0.0091 Epoch: 13 Global Step: 173430 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:45:32,904-Speed 3349.99 samples/sec Loss 2.6141 LearningRate 0.0091 Epoch: 13 Global Step: 173440 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:45:35,994-Speed 3314.47 samples/sec Loss 2.6594 LearningRate 0.0091 Epoch: 13 Global Step: 173450 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:45:39,187-Speed 3208.49 samples/sec Loss 2.5741 LearningRate 0.0091 Epoch: 13 Global Step: 173460 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:45:42,324-Speed 3265.63 samples/sec Loss 2.6035 LearningRate 0.0091 Epoch: 13 Global Step: 173470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:45:45,371-Speed 3361.20 samples/sec Loss 2.6416 LearningRate 0.0091 Epoch: 13 Global Step: 173480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:45:48,449-Speed 3328.28 samples/sec Loss 2.6060 LearningRate 0.0091 Epoch: 13 Global Step: 173490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:45:51,533-Speed 3321.62 samples/sec Loss 2.6693 LearningRate 0.0091 Epoch: 13 Global Step: 173500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:45:54,689-Speed 3246.24 samples/sec Loss 2.7136 LearningRate 0.0091 Epoch: 13 Global Step: 173510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:45:57,796-Speed 3296.82 samples/sec Loss 2.6460 LearningRate 0.0091 Epoch: 13 Global Step: 173520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:46:00,944-Speed 3253.50 samples/sec Loss 2.7162 LearningRate 0.0091 Epoch: 13 Global Step: 173530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:46:04,051-Speed 3297.36 samples/sec Loss 2.6475 LearningRate 0.0091 Epoch: 13 Global Step: 173540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:46:07,166-Speed 3288.50 samples/sec Loss 2.6806 LearningRate 0.0091 Epoch: 13 Global Step: 173550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:46:10,246-Speed 3325.84 samples/sec Loss 2.6637 LearningRate 0.0091 Epoch: 13 Global Step: 173560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:46:13,322-Speed 3329.08 samples/sec Loss 2.6674 LearningRate 0.0091 Epoch: 13 Global Step: 173570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:46:16,447-Speed 3278.56 samples/sec Loss 2.6200 LearningRate 0.0091 Epoch: 13 Global Step: 173580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:46:19,571-Speed 3278.70 samples/sec Loss 2.6765 LearningRate 0.0091 Epoch: 13 Global Step: 173590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:46:22,630-Speed 3348.17 samples/sec Loss 2.6559 LearningRate 0.0091 Epoch: 13 Global Step: 173600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:46:25,816-Speed 3214.81 samples/sec Loss 2.5801 LearningRate 0.0091 Epoch: 13 Global Step: 173610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:46:28,972-Speed 3246.10 samples/sec Loss 2.6361 LearningRate 0.0091 Epoch: 13 Global Step: 173620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:46:32,049-Speed 3328.43 samples/sec Loss 2.5999 LearningRate 0.0091 Epoch: 13 Global Step: 173630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:46:35,147-Speed 3306.70 samples/sec Loss 2.6309 LearningRate 0.0091 Epoch: 13 Global Step: 173640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:46:38,345-Speed 3202.95 samples/sec Loss 2.6350 LearningRate 0.0091 Epoch: 13 Global Step: 173650 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:46:41,410-Speed 3342.15 samples/sec Loss 2.6452 LearningRate 0.0091 Epoch: 13 Global Step: 173660 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:46:44,511-Speed 3303.45 samples/sec Loss 2.6994 LearningRate 0.0091 Epoch: 13 Global Step: 173670 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:46:47,619-Speed 3295.84 samples/sec Loss 2.6171 LearningRate 0.0091 Epoch: 13 Global Step: 173680 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:46:50,757-Speed 3263.24 samples/sec Loss 2.6669 LearningRate 0.0090 Epoch: 13 Global Step: 173690 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:46:53,890-Speed 3269.64 samples/sec Loss 2.6094 LearningRate 0.0090 Epoch: 13 Global Step: 173700 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:46:56,990-Speed 3304.87 samples/sec Loss 2.6229 LearningRate 0.0090 Epoch: 13 Global Step: 173710 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:47:00,115-Speed 3278.08 samples/sec Loss 2.6574 LearningRate 0.0090 Epoch: 13 Global Step: 173720 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:47:03,232-Speed 3286.18 samples/sec Loss 2.6323 LearningRate 0.0090 Epoch: 13 Global Step: 173730 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:47:06,443-Speed 3189.75 samples/sec Loss 2.5589 LearningRate 0.0090 Epoch: 13 Global Step: 173740 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:47:09,532-Speed 3316.12 samples/sec Loss 2.6719 LearningRate 0.0090 Epoch: 13 Global Step: 173750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:47:12,744-Speed 3189.24 samples/sec Loss 2.6826 LearningRate 0.0090 Epoch: 13 Global Step: 173760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:47:15,913-Speed 3231.94 samples/sec Loss 2.6745 LearningRate 0.0090 Epoch: 13 Global Step: 173770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:47:19,069-Speed 3246.01 samples/sec Loss 2.6120 LearningRate 0.0090 Epoch: 13 Global Step: 173780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:47:22,125-Speed 3350.71 samples/sec Loss 2.6520 LearningRate 0.0090 Epoch: 13 Global Step: 173790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:47:25,258-Speed 3269.91 samples/sec Loss 2.5880 LearningRate 0.0090 Epoch: 13 Global Step: 173800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:47:28,406-Speed 3254.28 samples/sec Loss 2.5462 LearningRate 0.0090 Epoch: 13 Global Step: 173810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:47:31,465-Speed 3347.87 samples/sec Loss 2.6817 LearningRate 0.0090 Epoch: 13 Global Step: 173820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:47:34,540-Speed 3331.75 samples/sec Loss 2.6123 LearningRate 0.0090 Epoch: 13 Global Step: 173830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:47:37,644-Speed 3299.68 samples/sec Loss 2.5878 LearningRate 0.0090 Epoch: 13 Global Step: 173840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:47:40,744-Speed 3304.53 samples/sec Loss 2.6542 LearningRate 0.0090 Epoch: 13 Global Step: 173850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:47:43,872-Speed 3274.38 samples/sec Loss 2.5466 LearningRate 0.0090 Epoch: 13 Global Step: 173860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:47:47,006-Speed 3268.87 samples/sec Loss 2.6258 LearningRate 0.0090 Epoch: 13 Global Step: 173870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:47:50,151-Speed 3256.90 samples/sec Loss 2.6290 LearningRate 0.0090 Epoch: 13 Global Step: 173880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:47:53,494-Speed 3064.13 samples/sec Loss 2.5898 LearningRate 0.0090 Epoch: 13 Global Step: 173890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:48:25,275-Speed 322.22 samples/sec Loss 2.1452 LearningRate 0.0090 Epoch: 14 Global Step: 173900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:48:28,772-Speed 2928.97 samples/sec Loss 1.8827 LearningRate 0.0090 Epoch: 14 Global Step: 173910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:48:31,862-Speed 3315.48 samples/sec Loss 1.8367 LearningRate 0.0090 Epoch: 14 Global Step: 173920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:48:34,942-Speed 3325.54 samples/sec Loss 1.8600 LearningRate 0.0090 Epoch: 14 Global Step: 173930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:48:38,098-Speed 3245.78 samples/sec Loss 1.9312 LearningRate 0.0090 Epoch: 14 Global Step: 173940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:48:41,382-Speed 3119.03 samples/sec Loss 1.8393 LearningRate 0.0090 Epoch: 14 Global Step: 173950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:48:44,671-Speed 3113.98 samples/sec Loss 1.9605 LearningRate 0.0090 Epoch: 14 Global Step: 173960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:48:47,784-Speed 3291.13 samples/sec Loss 1.8275 LearningRate 0.0090 Epoch: 14 Global Step: 173970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:48:51,114-Speed 3075.91 samples/sec Loss 1.9132 LearningRate 0.0090 Epoch: 14 Global Step: 173980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:48:54,334-Speed 3181.94 samples/sec Loss 1.8391 LearningRate 0.0090 Epoch: 14 Global Step: 173990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:48:57,462-Speed 3274.28 samples/sec Loss 1.8381 LearningRate 0.0090 Epoch: 14 Global Step: 174000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:00,583-Speed 3282.63 samples/sec Loss 1.9060 LearningRate 0.0090 Epoch: 14 Global Step: 174010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:03,696-Speed 3289.77 samples/sec Loss 1.8798 LearningRate 0.0090 Epoch: 14 Global Step: 174020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:06,838-Speed 3260.59 samples/sec Loss 1.8843 LearningRate 0.0090 Epoch: 14 Global Step: 174030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:10,027-Speed 3211.74 samples/sec Loss 1.8833 LearningRate 0.0090 Epoch: 14 Global Step: 174040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:13,127-Speed 3303.92 samples/sec Loss 1.8353 LearningRate 0.0090 Epoch: 14 Global Step: 174050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:16,212-Speed 3320.79 samples/sec Loss 1.8239 LearningRate 0.0090 Epoch: 14 Global Step: 174060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:19,357-Speed 3257.50 samples/sec Loss 1.8999 LearningRate 0.0090 Epoch: 14 Global Step: 174070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:49:22,453-Speed 3307.69 samples/sec Loss 1.9062 LearningRate 0.0090 Epoch: 14 Global Step: 174080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:49:25,594-Speed 3260.99 samples/sec Loss 1.9150 LearningRate 0.0090 Epoch: 14 Global Step: 174090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:28,720-Speed 3277.06 samples/sec Loss 1.9007 LearningRate 0.0090 Epoch: 14 Global Step: 174100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:31,805-Speed 3320.42 samples/sec Loss 1.8363 LearningRate 0.0089 Epoch: 14 Global Step: 174110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:34,906-Speed 3302.66 samples/sec Loss 1.8254 LearningRate 0.0089 Epoch: 14 Global Step: 174120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:37,994-Speed 3317.82 samples/sec Loss 1.8803 LearningRate 0.0089 Epoch: 14 Global Step: 174130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:41,071-Speed 3328.28 samples/sec Loss 1.8723 LearningRate 0.0089 Epoch: 14 Global Step: 174140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:44,156-Speed 3320.53 samples/sec Loss 1.8572 LearningRate 0.0089 Epoch: 14 Global Step: 174150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:47,241-Speed 3321.06 samples/sec Loss 1.8296 LearningRate 0.0089 Epoch: 14 Global Step: 174160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:50,342-Speed 3303.10 samples/sec Loss 1.8572 LearningRate 0.0089 Epoch: 14 Global Step: 174170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:53,506-Speed 3236.96 samples/sec Loss 1.8510 LearningRate 0.0089 Epoch: 14 Global Step: 174180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:49:56,582-Speed 3330.11 samples/sec Loss 1.8996 LearningRate 0.0089 Epoch: 14 Global Step: 174190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:49:59,705-Speed 3279.94 samples/sec Loss 1.8954 LearningRate 0.0089 Epoch: 14 Global Step: 174200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:50:02,789-Speed 3321.32 samples/sec Loss 1.9169 LearningRate 0.0089 Epoch: 14 Global Step: 174210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:50:05,890-Speed 3303.04 samples/sec Loss 1.8037 LearningRate 0.0089 Epoch: 14 Global Step: 174220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:50:09,018-Speed 3275.19 samples/sec Loss 1.9112 LearningRate 0.0089 Epoch: 14 Global Step: 174230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:50:12,124-Speed 3297.55 samples/sec Loss 1.9025 LearningRate 0.0089 Epoch: 14 Global Step: 174240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:50:15,266-Speed 3260.68 samples/sec Loss 1.9439 LearningRate 0.0089 Epoch: 14 Global Step: 174250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:50:18,375-Speed 3294.60 samples/sec Loss 1.9166 LearningRate 0.0089 Epoch: 14 Global Step: 174260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:50:21,466-Speed 3314.37 samples/sec Loss 1.8778 LearningRate 0.0089 Epoch: 14 Global Step: 174270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:50:24,539-Speed 3333.47 samples/sec Loss 1.8900 LearningRate 0.0089 Epoch: 14 Global Step: 174280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:50:27,731-Speed 3208.31 samples/sec Loss 1.8628 LearningRate 0.0089 Epoch: 14 Global Step: 174290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:50:30,780-Speed 3359.65 samples/sec Loss 1.9099 LearningRate 0.0089 Epoch: 14 Global Step: 174300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:50:33,851-Speed 3336.41 samples/sec Loss 1.8883 LearningRate 0.0089 Epoch: 14 Global Step: 174310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:50:36,918-Speed 3339.96 samples/sec Loss 1.9157 LearningRate 0.0089 Epoch: 14 Global Step: 174320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:50:39,987-Speed 3337.61 samples/sec Loss 1.8224 LearningRate 0.0089 Epoch: 14 Global Step: 174330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:50:43,130-Speed 3258.84 samples/sec Loss 1.8854 LearningRate 0.0089 Epoch: 14 Global Step: 174340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:50:46,193-Speed 3344.53 samples/sec Loss 1.8483 LearningRate 0.0089 Epoch: 14 Global Step: 174350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:50:49,338-Speed 3256.90 samples/sec Loss 1.8486 LearningRate 0.0089 Epoch: 14 Global Step: 174360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:50:52,421-Speed 3321.81 samples/sec Loss 1.8596 LearningRate 0.0089 Epoch: 14 Global Step: 174370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:50:55,522-Speed 3303.37 samples/sec Loss 1.9131 LearningRate 0.0089 Epoch: 14 Global Step: 174380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:50:58,590-Speed 3338.47 samples/sec Loss 1.8610 LearningRate 0.0089 Epoch: 14 Global Step: 174390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:01,778-Speed 3213.69 samples/sec Loss 1.9260 LearningRate 0.0089 Epoch: 14 Global Step: 174400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:51:04,908-Speed 3272.21 samples/sec Loss 1.8851 LearningRate 0.0089 Epoch: 14 Global Step: 174410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:08,005-Speed 3307.54 samples/sec Loss 1.9393 LearningRate 0.0089 Epoch: 14 Global Step: 174420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:11,078-Speed 3333.57 samples/sec Loss 1.8572 LearningRate 0.0089 Epoch: 14 Global Step: 174430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:14,190-Speed 3290.85 samples/sec Loss 1.8995 LearningRate 0.0089 Epoch: 14 Global Step: 174440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:17,286-Speed 3308.98 samples/sec Loss 1.9366 LearningRate 0.0089 Epoch: 14 Global Step: 174450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:20,366-Speed 3325.56 samples/sec Loss 1.8587 LearningRate 0.0089 Epoch: 14 Global Step: 174460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:23,432-Speed 3341.91 samples/sec Loss 1.9558 LearningRate 0.0089 Epoch: 14 Global Step: 174470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:26,628-Speed 3204.55 samples/sec Loss 1.9233 LearningRate 0.0089 Epoch: 14 Global Step: 174480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:29,763-Speed 3266.99 samples/sec Loss 1.9157 LearningRate 0.0089 Epoch: 14 Global Step: 174490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:32,860-Speed 3307.93 samples/sec Loss 1.9011 LearningRate 0.0089 Epoch: 14 Global Step: 174500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:36,029-Speed 3232.15 samples/sec Loss 1.9233 LearningRate 0.0089 Epoch: 14 Global Step: 174510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:51:39,126-Speed 3307.40 samples/sec Loss 1.8283 LearningRate 0.0088 Epoch: 14 Global Step: 174520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:42,361-Speed 3166.47 samples/sec Loss 1.8732 LearningRate 0.0088 Epoch: 14 Global Step: 174530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:45,476-Speed 3288.24 samples/sec Loss 1.9256 LearningRate 0.0088 Epoch: 14 Global Step: 174540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:48,598-Speed 3280.61 samples/sec Loss 1.9216 LearningRate 0.0088 Epoch: 14 Global Step: 174550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:51,848-Speed 3152.10 samples/sec Loss 1.9517 LearningRate 0.0088 Epoch: 14 Global Step: 174560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:54,967-Speed 3284.06 samples/sec Loss 1.8934 LearningRate 0.0088 Epoch: 14 Global Step: 174570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:51:58,085-Speed 3284.97 samples/sec Loss 1.9525 LearningRate 0.0088 Epoch: 14 Global Step: 174580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:52:01,229-Speed 3258.26 samples/sec Loss 1.9440 LearningRate 0.0088 Epoch: 14 Global Step: 174590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:52:04,320-Speed 3313.11 samples/sec Loss 1.8890 LearningRate 0.0088 Epoch: 14 Global Step: 174600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:52:07,394-Speed 3332.22 samples/sec Loss 1.9613 LearningRate 0.0088 Epoch: 14 Global Step: 174610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:52:10,451-Speed 3351.59 samples/sec Loss 1.9283 LearningRate 0.0088 Epoch: 14 Global Step: 174620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:52:13,613-Speed 3239.26 samples/sec Loss 1.8897 LearningRate 0.0088 Epoch: 14 Global Step: 174630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:52:16,687-Speed 3332.26 samples/sec Loss 1.9560 LearningRate 0.0088 Epoch: 14 Global Step: 174640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:52:19,734-Speed 3360.88 samples/sec Loss 1.9391 LearningRate 0.0088 Epoch: 14 Global Step: 174650 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:52:22,858-Speed 3279.72 samples/sec Loss 1.9169 LearningRate 0.0088 Epoch: 14 Global Step: 174660 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:52:26,031-Speed 3228.22 samples/sec Loss 1.9197 LearningRate 0.0088 Epoch: 14 Global Step: 174670 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:52:29,190-Speed 3242.61 samples/sec Loss 1.8242 LearningRate 0.0088 Epoch: 14 Global Step: 174680 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:52:32,244-Speed 3354.01 samples/sec Loss 1.9704 LearningRate 0.0088 Epoch: 14 Global Step: 174690 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:52:35,377-Speed 3269.31 samples/sec Loss 1.9368 LearningRate 0.0088 Epoch: 14 Global Step: 174700 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:52:38,538-Speed 3240.36 samples/sec Loss 1.9760 LearningRate 0.0088 Epoch: 14 Global Step: 174710 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:52:41,658-Speed 3282.68 samples/sec Loss 1.9218 LearningRate 0.0088 Epoch: 14 Global Step: 174720 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:52:44,756-Speed 3306.47 samples/sec Loss 1.9568 LearningRate 0.0088 Epoch: 14 Global Step: 174730 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:52:47,873-Speed 3286.27 samples/sec Loss 1.9593 LearningRate 0.0088 Epoch: 14 Global Step: 174740 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 16:52:50,958-Speed 3320.30 samples/sec Loss 1.9070 LearningRate 0.0088 Epoch: 14 Global Step: 174750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:52:54,084-Speed 3276.67 samples/sec Loss 1.9156 LearningRate 0.0088 Epoch: 14 Global Step: 174760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:52:57,152-Speed 3338.77 samples/sec Loss 1.9735 LearningRate 0.0088 Epoch: 14 Global Step: 174770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:00,249-Speed 3307.42 samples/sec Loss 1.9354 LearningRate 0.0088 Epoch: 14 Global Step: 174780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:03,384-Speed 3267.62 samples/sec Loss 1.9120 LearningRate 0.0088 Epoch: 14 Global Step: 174790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:06,506-Speed 3281.64 samples/sec Loss 1.9409 LearningRate 0.0088 Epoch: 14 Global Step: 174800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:09,632-Speed 3276.19 samples/sec Loss 1.9255 LearningRate 0.0088 Epoch: 14 Global Step: 174810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:12,716-Speed 3321.28 samples/sec Loss 1.9304 LearningRate 0.0088 Epoch: 14 Global Step: 174820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:15,832-Speed 3287.86 samples/sec Loss 1.9823 LearningRate 0.0088 Epoch: 14 Global Step: 174830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:18,967-Speed 3267.34 samples/sec Loss 1.9485 LearningRate 0.0088 Epoch: 14 Global Step: 174840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:22,037-Speed 3336.79 samples/sec Loss 1.9869 LearningRate 0.0088 Epoch: 14 Global Step: 174850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:53:25,181-Speed 3257.62 samples/sec Loss 1.9263 LearningRate 0.0088 Epoch: 14 Global Step: 174860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:53:28,364-Speed 3218.44 samples/sec Loss 1.9789 LearningRate 0.0088 Epoch: 14 Global Step: 174870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:53:31,468-Speed 3300.16 samples/sec Loss 1.9897 LearningRate 0.0088 Epoch: 14 Global Step: 174880 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:34,666-Speed 3203.18 samples/sec Loss 1.9232 LearningRate 0.0088 Epoch: 14 Global Step: 174890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:37,830-Speed 3237.55 samples/sec Loss 1.9362 LearningRate 0.0088 Epoch: 14 Global Step: 174900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:40,975-Speed 3256.68 samples/sec Loss 1.9879 LearningRate 0.0088 Epoch: 14 Global Step: 174910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:44,101-Speed 3276.32 samples/sec Loss 1.8929 LearningRate 0.0088 Epoch: 14 Global Step: 174920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:47,157-Speed 3351.55 samples/sec Loss 1.9513 LearningRate 0.0088 Epoch: 14 Global Step: 174930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:50,263-Speed 3298.18 samples/sec Loss 1.9272 LearningRate 0.0087 Epoch: 14 Global Step: 174940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:53,523-Speed 3142.41 samples/sec Loss 1.9261 LearningRate 0.0087 Epoch: 14 Global Step: 174950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:56,656-Speed 3269.10 samples/sec Loss 1.9366 LearningRate 0.0087 Epoch: 14 Global Step: 174960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:53:59,799-Speed 3259.40 samples/sec Loss 1.9159 LearningRate 0.0087 Epoch: 14 Global Step: 174970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:02,973-Speed 3227.20 samples/sec Loss 1.9461 LearningRate 0.0087 Epoch: 14 Global Step: 174980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:54:06,068-Speed 3309.16 samples/sec Loss 1.9608 LearningRate 0.0087 Epoch: 14 Global Step: 174990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:09,184-Speed 3287.76 samples/sec Loss 1.9447 LearningRate 0.0087 Epoch: 14 Global Step: 175000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:12,295-Speed 3291.95 samples/sec Loss 1.9684 LearningRate 0.0087 Epoch: 14 Global Step: 175010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:15,436-Speed 3262.10 samples/sec Loss 1.9048 LearningRate 0.0087 Epoch: 14 Global Step: 175020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:18,547-Speed 3292.13 samples/sec Loss 1.9826 LearningRate 0.0087 Epoch: 14 Global Step: 175030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:21,600-Speed 3354.60 samples/sec Loss 1.9716 LearningRate 0.0087 Epoch: 14 Global Step: 175040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:24,673-Speed 3333.66 samples/sec Loss 1.9094 LearningRate 0.0087 Epoch: 14 Global Step: 175050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:27,854-Speed 3220.46 samples/sec Loss 2.0004 LearningRate 0.0087 Epoch: 14 Global Step: 175060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:30,978-Speed 3278.26 samples/sec Loss 1.9013 LearningRate 0.0087 Epoch: 14 Global Step: 175070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:34,042-Speed 3343.55 samples/sec Loss 1.9670 LearningRate 0.0087 Epoch: 14 Global Step: 175080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:37,168-Speed 3276.13 samples/sec Loss 1.9090 LearningRate 0.0087 Epoch: 14 Global Step: 175090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:54:40,280-Speed 3292.22 samples/sec Loss 1.8987 LearningRate 0.0087 Epoch: 14 Global Step: 175100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:54:43,437-Speed 3244.45 samples/sec Loss 1.9638 LearningRate 0.0087 Epoch: 14 Global Step: 175110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:54:46,523-Speed 3319.20 samples/sec Loss 1.9516 LearningRate 0.0087 Epoch: 14 Global Step: 175120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:54:49,599-Speed 3330.41 samples/sec Loss 1.9467 LearningRate 0.0087 Epoch: 14 Global Step: 175130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:52,672-Speed 3333.46 samples/sec Loss 1.9255 LearningRate 0.0087 Epoch: 14 Global Step: 175140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:55,822-Speed 3251.54 samples/sec Loss 1.9372 LearningRate 0.0087 Epoch: 14 Global Step: 175150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:54:58,898-Speed 3329.62 samples/sec Loss 1.8909 LearningRate 0.0087 Epoch: 14 Global Step: 175160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:01,990-Speed 3312.50 samples/sec Loss 1.9876 LearningRate 0.0087 Epoch: 14 Global Step: 175170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:05,159-Speed 3233.21 samples/sec Loss 1.9187 LearningRate 0.0087 Epoch: 14 Global Step: 175180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:08,234-Speed 3330.46 samples/sec Loss 1.8802 LearningRate 0.0087 Epoch: 14 Global Step: 175190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:11,333-Speed 3305.77 samples/sec Loss 1.9815 LearningRate 0.0087 Epoch: 14 Global Step: 175200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:14,444-Speed 3292.62 samples/sec Loss 1.9360 LearningRate 0.0087 Epoch: 14 Global Step: 175210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:17,556-Speed 3290.81 samples/sec Loss 1.9741 LearningRate 0.0087 Epoch: 14 Global Step: 175220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:20,655-Speed 3305.47 samples/sec Loss 1.9742 LearningRate 0.0087 Epoch: 14 Global Step: 175230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:55:23,764-Speed 3294.72 samples/sec Loss 1.8973 LearningRate 0.0087 Epoch: 14 Global Step: 175240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:55:26,860-Speed 3308.50 samples/sec Loss 1.9826 LearningRate 0.0087 Epoch: 14 Global Step: 175250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:55:30,004-Speed 3257.45 samples/sec Loss 1.9138 LearningRate 0.0087 Epoch: 14 Global Step: 175260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:55:33,164-Speed 3241.86 samples/sec Loss 1.9105 LearningRate 0.0087 Epoch: 14 Global Step: 175270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:55:36,238-Speed 3331.97 samples/sec Loss 1.9696 LearningRate 0.0087 Epoch: 14 Global Step: 175280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:55:39,351-Speed 3290.77 samples/sec Loss 1.8995 LearningRate 0.0087 Epoch: 14 Global Step: 175290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:55:42,512-Speed 3240.26 samples/sec Loss 1.9636 LearningRate 0.0087 Epoch: 14 Global Step: 175300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:45,603-Speed 3313.57 samples/sec Loss 1.9617 LearningRate 0.0087 Epoch: 14 Global Step: 175310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:48,717-Speed 3290.22 samples/sec Loss 1.9020 LearningRate 0.0087 Epoch: 14 Global Step: 175320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:51,850-Speed 3269.09 samples/sec Loss 1.9718 LearningRate 0.0087 Epoch: 14 Global Step: 175330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:54,989-Speed 3263.19 samples/sec Loss 1.9301 LearningRate 0.0087 Epoch: 14 Global Step: 175340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:55:58,069-Speed 3325.97 samples/sec Loss 1.9419 LearningRate 0.0087 Epoch: 14 Global Step: 175350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:01,169-Speed 3304.51 samples/sec Loss 1.9362 LearningRate 0.0086 Epoch: 14 Global Step: 175360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:04,259-Speed 3314.84 samples/sec Loss 1.9913 LearningRate 0.0086 Epoch: 14 Global Step: 175370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:07,485-Speed 3175.29 samples/sec Loss 1.9704 LearningRate 0.0086 Epoch: 14 Global Step: 175380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:10,561-Speed 3329.67 samples/sec Loss 1.9251 LearningRate 0.0086 Epoch: 14 Global Step: 175390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:13,662-Speed 3303.05 samples/sec Loss 1.9334 LearningRate 0.0086 Epoch: 14 Global Step: 175400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:56:16,792-Speed 3273.67 samples/sec Loss 1.9593 LearningRate 0.0086 Epoch: 14 Global Step: 175410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:19,875-Speed 3321.49 samples/sec Loss 1.9613 LearningRate 0.0086 Epoch: 14 Global Step: 175420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:22,947-Speed 3335.15 samples/sec Loss 1.9613 LearningRate 0.0086 Epoch: 14 Global Step: 175430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:26,071-Speed 3278.96 samples/sec Loss 1.9569 LearningRate 0.0086 Epoch: 14 Global Step: 175440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:29,205-Speed 3267.80 samples/sec Loss 2.0932 LearningRate 0.0086 Epoch: 14 Global Step: 175450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:32,329-Speed 3279.06 samples/sec Loss 1.9509 LearningRate 0.0086 Epoch: 14 Global Step: 175460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:35,453-Speed 3279.93 samples/sec Loss 1.9582 LearningRate 0.0086 Epoch: 14 Global Step: 175470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:38,581-Speed 3274.44 samples/sec Loss 2.0060 LearningRate 0.0086 Epoch: 14 Global Step: 175480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:41,695-Speed 3289.19 samples/sec Loss 1.9990 LearningRate 0.0086 Epoch: 14 Global Step: 175490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:44,823-Speed 3274.08 samples/sec Loss 1.9384 LearningRate 0.0086 Epoch: 14 Global Step: 175500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:56:47,898-Speed 3331.64 samples/sec Loss 2.0131 LearningRate 0.0086 Epoch: 14 Global Step: 175510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:56:51,027-Speed 3273.32 samples/sec Loss 1.9511 LearningRate 0.0086 Epoch: 14 Global Step: 175520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:56:54,183-Speed 3246.07 samples/sec Loss 1.9557 LearningRate 0.0086 Epoch: 14 Global Step: 175530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:56:57,247-Speed 3342.55 samples/sec Loss 1.9571 LearningRate 0.0086 Epoch: 14 Global Step: 175540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:57:00,305-Speed 3349.91 samples/sec Loss 1.9845 LearningRate 0.0086 Epoch: 14 Global Step: 175550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:57:03,450-Speed 3256.44 samples/sec Loss 1.9831 LearningRate 0.0086 Epoch: 14 Global Step: 175560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:57:06,659-Speed 3192.21 samples/sec Loss 1.9990 LearningRate 0.0086 Epoch: 14 Global Step: 175570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:57:09,731-Speed 3335.13 samples/sec Loss 1.9694 LearningRate 0.0086 Epoch: 14 Global Step: 175580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:57:12,857-Speed 3276.14 samples/sec Loss 2.0200 LearningRate 0.0086 Epoch: 14 Global Step: 175590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:57:15,995-Speed 3264.83 samples/sec Loss 1.9906 LearningRate 0.0086 Epoch: 14 Global Step: 175600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:57:19,110-Speed 3287.74 samples/sec Loss 2.0321 LearningRate 0.0086 Epoch: 14 Global Step: 175610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:57:22,168-Speed 3350.04 samples/sec Loss 1.9754 LearningRate 0.0086 Epoch: 14 Global Step: 175620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:57:25,372-Speed 3197.63 samples/sec Loss 2.0562 LearningRate 0.0086 Epoch: 14 Global Step: 175630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:57:28,471-Speed 3304.78 samples/sec Loss 2.0001 LearningRate 0.0086 Epoch: 14 Global Step: 175640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:57:31,627-Speed 3245.49 samples/sec Loss 2.0488 LearningRate 0.0086 Epoch: 14 Global Step: 175650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:57:34,685-Speed 3350.06 samples/sec Loss 2.0139 LearningRate 0.0086 Epoch: 14 Global Step: 175660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:57:37,740-Speed 3353.42 samples/sec Loss 1.9566 LearningRate 0.0086 Epoch: 14 Global Step: 175670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:57:40,805-Speed 3341.63 samples/sec Loss 1.9891 LearningRate 0.0086 Epoch: 14 Global Step: 175680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:57:43,908-Speed 3301.10 samples/sec Loss 1.9785 LearningRate 0.0086 Epoch: 14 Global Step: 175690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:57:47,011-Speed 3301.36 samples/sec Loss 1.9986 LearningRate 0.0086 Epoch: 14 Global Step: 175700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:57:50,114-Speed 3300.86 samples/sec Loss 2.0256 LearningRate 0.0086 Epoch: 14 Global Step: 175710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:57:53,298-Speed 3217.13 samples/sec Loss 1.9707 LearningRate 0.0086 Epoch: 14 Global Step: 175720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:57:56,432-Speed 3268.17 samples/sec Loss 1.9935 LearningRate 0.0086 Epoch: 14 Global Step: 175730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:57:59,586-Speed 3247.85 samples/sec Loss 1.9700 LearningRate 0.0086 Epoch: 14 Global Step: 175740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:58:02,693-Speed 3297.03 samples/sec Loss 2.0070 LearningRate 0.0086 Epoch: 14 Global Step: 175750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:58:05,803-Speed 3294.02 samples/sec Loss 1.9906 LearningRate 0.0086 Epoch: 14 Global Step: 175760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:58:08,896-Speed 3311.00 samples/sec Loss 1.9767 LearningRate 0.0086 Epoch: 14 Global Step: 175770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:11,975-Speed 3326.89 samples/sec Loss 1.9913 LearningRate 0.0086 Epoch: 14 Global Step: 175780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:15,117-Speed 3260.11 samples/sec Loss 1.9756 LearningRate 0.0085 Epoch: 14 Global Step: 175790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:18,244-Speed 3275.68 samples/sec Loss 1.9747 LearningRate 0.0085 Epoch: 14 Global Step: 175800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:21,310-Speed 3340.41 samples/sec Loss 2.0458 LearningRate 0.0085 Epoch: 14 Global Step: 175810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:24,373-Speed 3345.26 samples/sec Loss 1.9951 LearningRate 0.0085 Epoch: 14 Global Step: 175820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:27,615-Speed 3159.12 samples/sec Loss 2.0442 LearningRate 0.0085 Epoch: 14 Global Step: 175830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:30,825-Speed 3191.28 samples/sec Loss 2.0872 LearningRate 0.0085 Epoch: 14 Global Step: 175840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:33,885-Speed 3347.76 samples/sec Loss 1.9815 LearningRate 0.0085 Epoch: 14 Global Step: 175850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:37,011-Speed 3276.19 samples/sec Loss 2.0104 LearningRate 0.0085 Epoch: 14 Global Step: 175860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:40,064-Speed 3355.17 samples/sec Loss 1.9848 LearningRate 0.0085 Epoch: 14 Global Step: 175870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:58:43,226-Speed 3239.30 samples/sec Loss 1.9989 LearningRate 0.0085 Epoch: 14 Global Step: 175880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:58:46,267-Speed 3369.43 samples/sec Loss 1.9774 LearningRate 0.0085 Epoch: 14 Global Step: 175890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:49,363-Speed 3307.52 samples/sec Loss 2.0176 LearningRate 0.0085 Epoch: 14 Global Step: 175900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:52,520-Speed 3245.32 samples/sec Loss 2.0706 LearningRate 0.0085 Epoch: 14 Global Step: 175910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:55,648-Speed 3274.09 samples/sec Loss 1.9709 LearningRate 0.0085 Epoch: 14 Global Step: 175920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:58:58,785-Speed 3265.57 samples/sec Loss 1.9745 LearningRate 0.0085 Epoch: 14 Global Step: 175930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:59:01,886-Speed 3303.06 samples/sec Loss 1.9954 LearningRate 0.0085 Epoch: 14 Global Step: 175940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:59:05,006-Speed 3283.37 samples/sec Loss 2.0816 LearningRate 0.0085 Epoch: 14 Global Step: 175950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:59:08,119-Speed 3290.50 samples/sec Loss 2.0054 LearningRate 0.0085 Epoch: 14 Global Step: 175960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:59:11,201-Speed 3324.03 samples/sec Loss 2.0203 LearningRate 0.0085 Epoch: 14 Global Step: 175970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:59:14,279-Speed 3327.16 samples/sec Loss 1.9647 LearningRate 0.0085 Epoch: 14 Global Step: 175980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 16:59:17,390-Speed 3292.96 samples/sec Loss 2.0710 LearningRate 0.0085 Epoch: 14 Global Step: 175990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:20,500-Speed 3294.22 samples/sec Loss 2.0418 LearningRate 0.0085 Epoch: 14 Global Step: 176000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:23,622-Speed 3280.41 samples/sec Loss 1.9947 LearningRate 0.0085 Epoch: 14 Global Step: 176010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:26,718-Speed 3308.53 samples/sec Loss 1.9958 LearningRate 0.0085 Epoch: 14 Global Step: 176020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:29,808-Speed 3315.26 samples/sec Loss 2.0033 LearningRate 0.0085 Epoch: 14 Global Step: 176030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:32,869-Speed 3346.86 samples/sec Loss 2.0111 LearningRate 0.0085 Epoch: 14 Global Step: 176040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:35,949-Speed 3325.25 samples/sec Loss 2.0260 LearningRate 0.0085 Epoch: 14 Global Step: 176050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:39,029-Speed 3326.64 samples/sec Loss 2.0497 LearningRate 0.0085 Epoch: 14 Global Step: 176060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:42,102-Speed 3333.04 samples/sec Loss 2.0670 LearningRate 0.0085 Epoch: 14 Global Step: 176070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:45,177-Speed 3330.84 samples/sec Loss 2.0914 LearningRate 0.0085 Epoch: 14 Global Step: 176080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:48,361-Speed 3217.49 samples/sec Loss 2.0193 LearningRate 0.0085 Epoch: 14 Global Step: 176090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:51,440-Speed 3326.77 samples/sec Loss 2.0244 LearningRate 0.0085 Epoch: 14 Global Step: 176100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:54,504-Speed 3342.43 samples/sec Loss 1.9950 LearningRate 0.0085 Epoch: 14 Global Step: 176110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 16:59:57,580-Speed 3330.47 samples/sec Loss 2.1180 LearningRate 0.0085 Epoch: 14 Global Step: 176120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:00,651-Speed 3335.92 samples/sec Loss 2.0375 LearningRate 0.0085 Epoch: 14 Global Step: 176130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:03,790-Speed 3263.20 samples/sec Loss 1.9959 LearningRate 0.0085 Epoch: 14 Global Step: 176140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:06,900-Speed 3293.65 samples/sec Loss 1.9648 LearningRate 0.0085 Epoch: 14 Global Step: 176150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:09,965-Speed 3341.54 samples/sec Loss 2.0112 LearningRate 0.0085 Epoch: 14 Global Step: 176160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:13,083-Speed 3286.05 samples/sec Loss 1.9719 LearningRate 0.0085 Epoch: 14 Global Step: 176170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:16,212-Speed 3272.84 samples/sec Loss 1.9491 LearningRate 0.0085 Epoch: 14 Global Step: 176180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:19,308-Speed 3308.71 samples/sec Loss 2.0343 LearningRate 0.0085 Epoch: 14 Global Step: 176190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 17:00:22,379-Speed 3335.61 samples/sec Loss 2.0665 LearningRate 0.0085 Epoch: 14 Global Step: 176200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:25,605-Speed 3175.88 samples/sec Loss 2.0262 LearningRate 0.0084 Epoch: 14 Global Step: 176210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:28,688-Speed 3322.42 samples/sec Loss 2.0161 LearningRate 0.0084 Epoch: 14 Global Step: 176220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:31,779-Speed 3313.78 samples/sec Loss 2.0555 LearningRate 0.0084 Epoch: 14 Global Step: 176230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:34,856-Speed 3328.48 samples/sec Loss 2.0790 LearningRate 0.0084 Epoch: 14 Global Step: 176240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:37,910-Speed 3354.12 samples/sec Loss 2.0487 LearningRate 0.0084 Epoch: 14 Global Step: 176250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:41,010-Speed 3304.68 samples/sec Loss 2.0099 LearningRate 0.0084 Epoch: 14 Global Step: 176260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:00:44,098-Speed 3316.49 samples/sec Loss 2.1103 LearningRate 0.0084 Epoch: 14 Global Step: 176270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:00:47,207-Speed 3295.38 samples/sec Loss 1.9975 LearningRate 0.0084 Epoch: 14 Global Step: 176280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:00:50,281-Speed 3331.13 samples/sec Loss 2.0011 LearningRate 0.0084 Epoch: 14 Global Step: 176290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:00:53,393-Speed 3292.32 samples/sec Loss 2.0231 LearningRate 0.0084 Epoch: 14 Global Step: 176300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:00:56,481-Speed 3316.57 samples/sec Loss 1.9980 LearningRate 0.0084 Epoch: 14 Global Step: 176310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:00:59,611-Speed 3273.25 samples/sec Loss 1.9778 LearningRate 0.0084 Epoch: 14 Global Step: 176320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:01:02,782-Speed 3230.45 samples/sec Loss 2.0278 LearningRate 0.0084 Epoch: 14 Global Step: 176330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:01:05,948-Speed 3236.06 samples/sec Loss 2.1039 LearningRate 0.0084 Epoch: 14 Global Step: 176340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:01:09,011-Speed 3343.90 samples/sec Loss 2.0227 LearningRate 0.0084 Epoch: 14 Global Step: 176350 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:01:12,084-Speed 3333.49 samples/sec Loss 1.9971 LearningRate 0.0084 Epoch: 14 Global Step: 176360 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-27 17:01:15,233-Speed 3252.98 samples/sec Loss 2.0257 LearningRate 0.0084 Epoch: 14 Global Step: 176370 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-27 17:01:18,369-Speed 3266.04 samples/sec Loss 2.0669 LearningRate 0.0084 Epoch: 14 Global Step: 176380 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-27 17:01:21,426-Speed 3351.48 samples/sec Loss 2.0893 LearningRate 0.0084 Epoch: 14 Global Step: 176390 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-27 17:01:24,543-Speed 3286.37 samples/sec Loss 1.9884 LearningRate 0.0084 Epoch: 14 Global Step: 176400 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-27 17:01:27,717-Speed 3226.90 samples/sec Loss 1.9970 LearningRate 0.0084 Epoch: 14 Global Step: 176410 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-27 17:01:30,832-Speed 3287.86 samples/sec Loss 2.0146 LearningRate 0.0084 Epoch: 14 Global Step: 176420 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-27 17:01:33,900-Speed 3339.06 samples/sec Loss 2.0302 LearningRate 0.0084 Epoch: 14 Global Step: 176430 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-27 17:01:36,970-Speed 3336.54 samples/sec Loss 2.1134 LearningRate 0.0084 Epoch: 14 Global Step: 176440 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-27 17:01:40,080-Speed 3293.92 samples/sec Loss 2.0540 LearningRate 0.0084 Epoch: 14 Global Step: 176450 Fp16 Grad Scale: 4096 Required: 6 hours Training: 2022-04-27 17:01:43,253-Speed 3228.09 samples/sec Loss 1.9980 LearningRate 0.0084 Epoch: 14 Global Step: 176460 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:01:46,341-Speed 3317.51 samples/sec Loss 2.0097 LearningRate 0.0084 Epoch: 14 Global Step: 176470 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:01:49,412-Speed 3334.96 samples/sec Loss 2.0208 LearningRate 0.0084 Epoch: 14 Global Step: 176480 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:01:52,470-Speed 3350.50 samples/sec Loss 1.9916 LearningRate 0.0084 Epoch: 14 Global Step: 176490 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:01:55,600-Speed 3272.36 samples/sec Loss 1.9895 LearningRate 0.0084 Epoch: 14 Global Step: 176500 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:01:58,684-Speed 3321.02 samples/sec Loss 2.0851 LearningRate 0.0084 Epoch: 14 Global Step: 176510 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:02:01,818-Speed 3269.05 samples/sec Loss 2.0279 LearningRate 0.0084 Epoch: 14 Global Step: 176520 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:02:04,943-Speed 3277.84 samples/sec Loss 1.9207 LearningRate 0.0084 Epoch: 14 Global Step: 176530 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:02:08,070-Speed 3276.05 samples/sec Loss 2.0489 LearningRate 0.0084 Epoch: 14 Global Step: 176540 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:02:11,161-Speed 3314.10 samples/sec Loss 2.1190 LearningRate 0.0084 Epoch: 14 Global Step: 176550 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:02:14,297-Speed 3266.39 samples/sec Loss 2.0784 LearningRate 0.0084 Epoch: 14 Global Step: 176560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:02:17,481-Speed 3217.12 samples/sec Loss 2.0274 LearningRate 0.0084 Epoch: 14 Global Step: 176570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:02:20,570-Speed 3315.88 samples/sec Loss 1.9857 LearningRate 0.0084 Epoch: 14 Global Step: 176580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:02:23,743-Speed 3227.61 samples/sec Loss 2.0089 LearningRate 0.0084 Epoch: 14 Global Step: 176590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:02:26,926-Speed 3218.52 samples/sec Loss 2.0082 LearningRate 0.0084 Epoch: 14 Global Step: 176600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:02:30,034-Speed 3295.70 samples/sec Loss 2.0520 LearningRate 0.0084 Epoch: 14 Global Step: 176610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:02:33,118-Speed 3321.05 samples/sec Loss 2.0100 LearningRate 0.0084 Epoch: 14 Global Step: 176620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:02:36,288-Speed 3231.17 samples/sec Loss 2.0230 LearningRate 0.0084 Epoch: 14 Global Step: 176630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:02:39,354-Speed 3340.81 samples/sec Loss 2.0341 LearningRate 0.0083 Epoch: 14 Global Step: 176640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:02:42,560-Speed 3195.79 samples/sec Loss 2.0501 LearningRate 0.0083 Epoch: 14 Global Step: 176650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:02:45,647-Speed 3317.70 samples/sec Loss 2.0236 LearningRate 0.0083 Epoch: 14 Global Step: 176660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:02:48,792-Speed 3257.22 samples/sec Loss 2.0381 LearningRate 0.0083 Epoch: 14 Global Step: 176670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:02:51,990-Speed 3202.79 samples/sec Loss 2.0469 LearningRate 0.0083 Epoch: 14 Global Step: 176680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:02:55,110-Speed 3282.86 samples/sec Loss 1.9762 LearningRate 0.0083 Epoch: 14 Global Step: 176690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:02:58,170-Speed 3348.16 samples/sec Loss 2.0794 LearningRate 0.0083 Epoch: 14 Global Step: 176700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:03:01,283-Speed 3290.45 samples/sec Loss 1.9928 LearningRate 0.0083 Epoch: 14 Global Step: 176710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:03:04,411-Speed 3274.43 samples/sec Loss 2.0361 LearningRate 0.0083 Epoch: 14 Global Step: 176720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:03:07,499-Speed 3317.18 samples/sec Loss 2.0344 LearningRate 0.0083 Epoch: 14 Global Step: 176730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:03:10,573-Speed 3332.01 samples/sec Loss 2.0719 LearningRate 0.0083 Epoch: 14 Global Step: 176740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:03:13,668-Speed 3310.65 samples/sec Loss 2.0135 LearningRate 0.0083 Epoch: 14 Global Step: 176750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:03:16,704-Speed 3373.46 samples/sec Loss 2.0525 LearningRate 0.0083 Epoch: 14 Global Step: 176760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:03:19,778-Speed 3332.45 samples/sec Loss 2.0211 LearningRate 0.0083 Epoch: 14 Global Step: 176770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:03:22,837-Speed 3348.70 samples/sec Loss 2.0146 LearningRate 0.0083 Epoch: 14 Global Step: 176780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:03:25,944-Speed 3296.62 samples/sec Loss 2.0408 LearningRate 0.0083 Epoch: 14 Global Step: 176790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:03:29,037-Speed 3311.49 samples/sec Loss 2.0170 LearningRate 0.0083 Epoch: 14 Global Step: 176800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:03:32,147-Speed 3293.61 samples/sec Loss 2.0039 LearningRate 0.0083 Epoch: 14 Global Step: 176810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:03:35,205-Speed 3350.18 samples/sec Loss 2.0540 LearningRate 0.0083 Epoch: 14 Global Step: 176820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:03:38,283-Speed 3328.11 samples/sec Loss 2.0012 LearningRate 0.0083 Epoch: 14 Global Step: 176830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:03:41,388-Speed 3298.81 samples/sec Loss 2.0583 LearningRate 0.0083 Epoch: 14 Global Step: 176840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:03:44,499-Speed 3291.84 samples/sec Loss 2.0631 LearningRate 0.0083 Epoch: 14 Global Step: 176850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:03:47,641-Speed 3260.48 samples/sec Loss 2.0566 LearningRate 0.0083 Epoch: 14 Global Step: 176860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:03:50,758-Speed 3286.77 samples/sec Loss 2.0662 LearningRate 0.0083 Epoch: 14 Global Step: 176870 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:03:53,892-Speed 3268.30 samples/sec Loss 2.1008 LearningRate 0.0083 Epoch: 14 Global Step: 176880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:03:56,941-Speed 3359.89 samples/sec Loss 2.0405 LearningRate 0.0083 Epoch: 14 Global Step: 176890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:04:00,070-Speed 3273.04 samples/sec Loss 2.0820 LearningRate 0.0083 Epoch: 14 Global Step: 176900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:04:03,230-Speed 3242.10 samples/sec Loss 2.0538 LearningRate 0.0083 Epoch: 14 Global Step: 176910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:04:06,429-Speed 3201.15 samples/sec Loss 2.0882 LearningRate 0.0083 Epoch: 14 Global Step: 176920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:04:09,495-Speed 3341.40 samples/sec Loss 2.1207 LearningRate 0.0083 Epoch: 14 Global Step: 176930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:04:12,610-Speed 3288.32 samples/sec Loss 2.0433 LearningRate 0.0083 Epoch: 14 Global Step: 176940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:04:15,759-Speed 3253.08 samples/sec Loss 2.0497 LearningRate 0.0083 Epoch: 14 Global Step: 176950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:04:18,863-Speed 3300.15 samples/sec Loss 2.0750 LearningRate 0.0083 Epoch: 14 Global Step: 176960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:04:21,919-Speed 3351.30 samples/sec Loss 2.0864 LearningRate 0.0083 Epoch: 14 Global Step: 176970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:04:25,034-Speed 3288.81 samples/sec Loss 2.0838 LearningRate 0.0083 Epoch: 14 Global Step: 176980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:04:28,216-Speed 3219.23 samples/sec Loss 2.1055 LearningRate 0.0083 Epoch: 14 Global Step: 176990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:04:31,311-Speed 3310.04 samples/sec Loss 2.0809 LearningRate 0.0083 Epoch: 14 Global Step: 177000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:04:34,389-Speed 3327.38 samples/sec Loss 2.1034 LearningRate 0.0083 Epoch: 14 Global Step: 177010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:04:37,469-Speed 3326.24 samples/sec Loss 2.0499 LearningRate 0.0083 Epoch: 14 Global Step: 177020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:04:40,561-Speed 3312.36 samples/sec Loss 2.0392 LearningRate 0.0083 Epoch: 14 Global Step: 177030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:04:43,639-Speed 3327.57 samples/sec Loss 2.0953 LearningRate 0.0083 Epoch: 14 Global Step: 177040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:04:46,729-Speed 3315.34 samples/sec Loss 2.0691 LearningRate 0.0083 Epoch: 14 Global Step: 177050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:04:49,873-Speed 3257.85 samples/sec Loss 2.0799 LearningRate 0.0083 Epoch: 14 Global Step: 177060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:04:52,952-Speed 3326.85 samples/sec Loss 2.1002 LearningRate 0.0082 Epoch: 14 Global Step: 177070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:04:56,012-Speed 3347.81 samples/sec Loss 2.1186 LearningRate 0.0082 Epoch: 14 Global Step: 177080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:04:59,078-Speed 3340.80 samples/sec Loss 2.0490 LearningRate 0.0082 Epoch: 14 Global Step: 177090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 17:05:02,213-Speed 3267.73 samples/sec Loss 2.0198 LearningRate 0.0082 Epoch: 14 Global Step: 177100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 17:05:05,280-Speed 3339.50 samples/sec Loss 2.0979 LearningRate 0.0082 Epoch: 14 Global Step: 177110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 17:05:08,415-Speed 3267.37 samples/sec Loss 2.0741 LearningRate 0.0082 Epoch: 14 Global Step: 177120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 17:05:11,557-Speed 3260.45 samples/sec Loss 2.0524 LearningRate 0.0082 Epoch: 14 Global Step: 177130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 17:05:14,631-Speed 3332.16 samples/sec Loss 2.0456 LearningRate 0.0082 Epoch: 14 Global Step: 177140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:17,749-Speed 3285.40 samples/sec Loss 2.0326 LearningRate 0.0082 Epoch: 14 Global Step: 177150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:20,798-Speed 3359.19 samples/sec Loss 2.1074 LearningRate 0.0082 Epoch: 14 Global Step: 177160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:23,855-Speed 3350.75 samples/sec Loss 2.0307 LearningRate 0.0082 Epoch: 14 Global Step: 177170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:26,977-Speed 3281.07 samples/sec Loss 2.1551 LearningRate 0.0082 Epoch: 14 Global Step: 177180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:30,152-Speed 3226.64 samples/sec Loss 2.0966 LearningRate 0.0082 Epoch: 14 Global Step: 177190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:33,249-Speed 3307.50 samples/sec Loss 2.0725 LearningRate 0.0082 Epoch: 14 Global Step: 177200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:36,401-Speed 3250.01 samples/sec Loss 2.0152 LearningRate 0.0082 Epoch: 14 Global Step: 177210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:39,541-Speed 3262.00 samples/sec Loss 2.0722 LearningRate 0.0082 Epoch: 14 Global Step: 177220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:42,696-Speed 3247.40 samples/sec Loss 2.0843 LearningRate 0.0082 Epoch: 14 Global Step: 177230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:45,776-Speed 3324.83 samples/sec Loss 2.0143 LearningRate 0.0082 Epoch: 14 Global Step: 177240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:48,860-Speed 3321.46 samples/sec Loss 2.0152 LearningRate 0.0082 Epoch: 14 Global Step: 177250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:05:51,900-Speed 3369.74 samples/sec Loss 2.0819 LearningRate 0.0082 Epoch: 14 Global Step: 177260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:05:54,988-Speed 3317.26 samples/sec Loss 2.0225 LearningRate 0.0082 Epoch: 14 Global Step: 177270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:05:58,071-Speed 3322.33 samples/sec Loss 2.0537 LearningRate 0.0082 Epoch: 14 Global Step: 177280 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:06:01,246-Speed 3226.20 samples/sec Loss 2.0345 LearningRate 0.0082 Epoch: 14 Global Step: 177290 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:06:04,393-Speed 3255.49 samples/sec Loss 2.0304 LearningRate 0.0082 Epoch: 14 Global Step: 177300 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:06:07,476-Speed 3322.34 samples/sec Loss 2.0427 LearningRate 0.0082 Epoch: 14 Global Step: 177310 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:06:10,572-Speed 3309.06 samples/sec Loss 2.1140 LearningRate 0.0082 Epoch: 14 Global Step: 177320 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:06:13,657-Speed 3319.76 samples/sec Loss 2.0460 LearningRate 0.0082 Epoch: 14 Global Step: 177330 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:06:16,715-Speed 3349.62 samples/sec Loss 2.0985 LearningRate 0.0082 Epoch: 14 Global Step: 177340 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:06:19,769-Speed 3353.77 samples/sec Loss 2.0896 LearningRate 0.0082 Epoch: 14 Global Step: 177350 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:06:22,969-Speed 3200.87 samples/sec Loss 2.0490 LearningRate 0.0082 Epoch: 14 Global Step: 177360 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:06:26,101-Speed 3271.53 samples/sec Loss 2.1067 LearningRate 0.0082 Epoch: 14 Global Step: 177370 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:06:29,243-Speed 3259.45 samples/sec Loss 2.1287 LearningRate 0.0082 Epoch: 14 Global Step: 177380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:06:32,349-Speed 3298.46 samples/sec Loss 2.0946 LearningRate 0.0082 Epoch: 14 Global Step: 177390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:06:35,408-Speed 3348.54 samples/sec Loss 2.0409 LearningRate 0.0082 Epoch: 14 Global Step: 177400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:06:38,503-Speed 3309.64 samples/sec Loss 2.0491 LearningRate 0.0082 Epoch: 14 Global Step: 177410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:06:41,613-Speed 3293.15 samples/sec Loss 2.0563 LearningRate 0.0082 Epoch: 14 Global Step: 177420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:06:44,684-Speed 3335.14 samples/sec Loss 2.0694 LearningRate 0.0082 Epoch: 14 Global Step: 177430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:06:47,796-Speed 3291.94 samples/sec Loss 2.0941 LearningRate 0.0082 Epoch: 14 Global Step: 177440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:06:50,849-Speed 3355.82 samples/sec Loss 2.1036 LearningRate 0.0082 Epoch: 14 Global Step: 177450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:06:54,026-Speed 3223.67 samples/sec Loss 2.0259 LearningRate 0.0082 Epoch: 14 Global Step: 177460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:06:57,100-Speed 3332.54 samples/sec Loss 2.1023 LearningRate 0.0082 Epoch: 14 Global Step: 177470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:00,210-Speed 3293.56 samples/sec Loss 2.1124 LearningRate 0.0082 Epoch: 14 Global Step: 177480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:07:03,340-Speed 3272.34 samples/sec Loss 2.1621 LearningRate 0.0082 Epoch: 14 Global Step: 177490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:07:06,454-Speed 3289.60 samples/sec Loss 2.0846 LearningRate 0.0082 Epoch: 14 Global Step: 177500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:09,501-Speed 3361.37 samples/sec Loss 2.0260 LearningRate 0.0081 Epoch: 14 Global Step: 177510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:12,633-Speed 3271.36 samples/sec Loss 2.0560 LearningRate 0.0081 Epoch: 14 Global Step: 177520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:15,712-Speed 3326.62 samples/sec Loss 2.0932 LearningRate 0.0081 Epoch: 14 Global Step: 177530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:18,828-Speed 3287.42 samples/sec Loss 2.1179 LearningRate 0.0081 Epoch: 14 Global Step: 177540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:21,919-Speed 3313.70 samples/sec Loss 2.1067 LearningRate 0.0081 Epoch: 14 Global Step: 177550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:25,026-Speed 3297.22 samples/sec Loss 2.0563 LearningRate 0.0081 Epoch: 14 Global Step: 177560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:28,127-Speed 3302.08 samples/sec Loss 2.1283 LearningRate 0.0081 Epoch: 14 Global Step: 177570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:31,195-Speed 3338.78 samples/sec Loss 2.0933 LearningRate 0.0081 Epoch: 14 Global Step: 177580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:34,240-Speed 3364.15 samples/sec Loss 2.0679 LearningRate 0.0081 Epoch: 14 Global Step: 177590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:37,329-Speed 3316.24 samples/sec Loss 2.0475 LearningRate 0.0081 Epoch: 14 Global Step: 177600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:40,388-Speed 3349.19 samples/sec Loss 2.0841 LearningRate 0.0081 Epoch: 14 Global Step: 177610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:43,454-Speed 3340.62 samples/sec Loss 2.0981 LearningRate 0.0081 Epoch: 14 Global Step: 177620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:46,544-Speed 3315.56 samples/sec Loss 2.0310 LearningRate 0.0081 Epoch: 14 Global Step: 177630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:49,679-Speed 3266.85 samples/sec Loss 2.1115 LearningRate 0.0081 Epoch: 14 Global Step: 177640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:52,888-Speed 3192.56 samples/sec Loss 2.1024 LearningRate 0.0081 Epoch: 14 Global Step: 177650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:56,008-Speed 3282.76 samples/sec Loss 2.0494 LearningRate 0.0081 Epoch: 14 Global Step: 177660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:07:59,066-Speed 3348.93 samples/sec Loss 2.1082 LearningRate 0.0081 Epoch: 14 Global Step: 177670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:08:02,136-Speed 3337.02 samples/sec Loss 2.0539 LearningRate 0.0081 Epoch: 14 Global Step: 177680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:08:05,237-Speed 3303.10 samples/sec Loss 2.0923 LearningRate 0.0081 Epoch: 14 Global Step: 177690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:08:08,305-Speed 3339.11 samples/sec Loss 2.0079 LearningRate 0.0081 Epoch: 14 Global Step: 177700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:08:11,371-Speed 3340.28 samples/sec Loss 2.0987 LearningRate 0.0081 Epoch: 14 Global Step: 177710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:08:14,455-Speed 3321.66 samples/sec Loss 2.0595 LearningRate 0.0081 Epoch: 14 Global Step: 177720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:08:17,589-Speed 3269.09 samples/sec Loss 2.1285 LearningRate 0.0081 Epoch: 14 Global Step: 177730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:08:20,664-Speed 3330.74 samples/sec Loss 2.0753 LearningRate 0.0081 Epoch: 14 Global Step: 177740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:08:23,798-Speed 3268.50 samples/sec Loss 2.1241 LearningRate 0.0081 Epoch: 14 Global Step: 177750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:08:26,947-Speed 3252.71 samples/sec Loss 2.0898 LearningRate 0.0081 Epoch: 14 Global Step: 177760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:08:30,080-Speed 3269.49 samples/sec Loss 2.0941 LearningRate 0.0081 Epoch: 14 Global Step: 177770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:08:33,197-Speed 3286.20 samples/sec Loss 2.0387 LearningRate 0.0081 Epoch: 14 Global Step: 177780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:08:36,286-Speed 3315.68 samples/sec Loss 2.0826 LearningRate 0.0081 Epoch: 14 Global Step: 177790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:08:39,428-Speed 3261.12 samples/sec Loss 2.0853 LearningRate 0.0081 Epoch: 14 Global Step: 177800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:08:42,652-Speed 3177.00 samples/sec Loss 2.0609 LearningRate 0.0081 Epoch: 14 Global Step: 177810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:08:45,684-Speed 3378.18 samples/sec Loss 2.0736 LearningRate 0.0081 Epoch: 14 Global Step: 177820 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:08:48,755-Speed 3335.86 samples/sec Loss 2.1209 LearningRate 0.0081 Epoch: 14 Global Step: 177830 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:08:51,878-Speed 3279.55 samples/sec Loss 2.0769 LearningRate 0.0081 Epoch: 14 Global Step: 177840 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:08:54,979-Speed 3303.24 samples/sec Loss 2.1143 LearningRate 0.0081 Epoch: 14 Global Step: 177850 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:08:58,036-Speed 3351.07 samples/sec Loss 2.1299 LearningRate 0.0081 Epoch: 14 Global Step: 177860 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:09:01,105-Speed 3336.68 samples/sec Loss 2.1125 LearningRate 0.0081 Epoch: 14 Global Step: 177870 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:09:04,171-Speed 3341.25 samples/sec Loss 2.1380 LearningRate 0.0081 Epoch: 14 Global Step: 177880 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:09:07,254-Speed 3323.23 samples/sec Loss 2.1380 LearningRate 0.0081 Epoch: 14 Global Step: 177890 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:09:10,404-Speed 3251.61 samples/sec Loss 2.1102 LearningRate 0.0081 Epoch: 14 Global Step: 177900 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:09:13,520-Speed 3286.70 samples/sec Loss 2.1392 LearningRate 0.0081 Epoch: 14 Global Step: 177910 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:09:16,663-Speed 3258.94 samples/sec Loss 2.0449 LearningRate 0.0081 Epoch: 14 Global Step: 177920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:09:19,798-Speed 3267.52 samples/sec Loss 2.1185 LearningRate 0.0081 Epoch: 14 Global Step: 177930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:09:22,853-Speed 3352.86 samples/sec Loss 2.1155 LearningRate 0.0080 Epoch: 14 Global Step: 177940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:09:25,940-Speed 3318.49 samples/sec Loss 2.1450 LearningRate 0.0080 Epoch: 14 Global Step: 177950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:09:29,057-Speed 3286.10 samples/sec Loss 2.0994 LearningRate 0.0080 Epoch: 14 Global Step: 177960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:09:32,148-Speed 3314.52 samples/sec Loss 2.1071 LearningRate 0.0080 Epoch: 14 Global Step: 177970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:09:35,254-Speed 3297.09 samples/sec Loss 2.0776 LearningRate 0.0080 Epoch: 14 Global Step: 177980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:09:38,381-Speed 3276.12 samples/sec Loss 2.0407 LearningRate 0.0080 Epoch: 14 Global Step: 177990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:09:41,471-Speed 3315.41 samples/sec Loss 2.0056 LearningRate 0.0080 Epoch: 14 Global Step: 178000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:09:44,527-Speed 3352.13 samples/sec Loss 2.1075 LearningRate 0.0080 Epoch: 14 Global Step: 178010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:09:47,654-Speed 3274.99 samples/sec Loss 2.1005 LearningRate 0.0080 Epoch: 14 Global Step: 178020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:09:50,777-Speed 3280.06 samples/sec Loss 2.1439 LearningRate 0.0080 Epoch: 14 Global Step: 178030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:09:53,929-Speed 3250.03 samples/sec Loss 2.0961 LearningRate 0.0080 Epoch: 14 Global Step: 178040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:09:57,020-Speed 3314.53 samples/sec Loss 2.0688 LearningRate 0.0080 Epoch: 14 Global Step: 178050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:00,150-Speed 3271.59 samples/sec Loss 2.1025 LearningRate 0.0080 Epoch: 14 Global Step: 178060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:03,251-Speed 3303.80 samples/sec Loss 2.1217 LearningRate 0.0080 Epoch: 14 Global Step: 178070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:06,356-Speed 3299.01 samples/sec Loss 2.0708 LearningRate 0.0080 Epoch: 14 Global Step: 178080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:09,457-Speed 3303.37 samples/sec Loss 2.0784 LearningRate 0.0080 Epoch: 14 Global Step: 178090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:12,555-Speed 3306.44 samples/sec Loss 2.0240 LearningRate 0.0080 Epoch: 14 Global Step: 178100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:15,681-Speed 3276.45 samples/sec Loss 2.0893 LearningRate 0.0080 Epoch: 14 Global Step: 178110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:18,769-Speed 3317.24 samples/sec Loss 2.0823 LearningRate 0.0080 Epoch: 14 Global Step: 178120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 17:10:21,838-Speed 3337.82 samples/sec Loss 2.0417 LearningRate 0.0080 Epoch: 14 Global Step: 178130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:24,920-Speed 3323.23 samples/sec Loss 2.1077 LearningRate 0.0080 Epoch: 14 Global Step: 178140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:28,020-Speed 3304.23 samples/sec Loss 2.0889 LearningRate 0.0080 Epoch: 14 Global Step: 178150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:31,179-Speed 3243.02 samples/sec Loss 2.1223 LearningRate 0.0080 Epoch: 14 Global Step: 178160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:34,258-Speed 3326.19 samples/sec Loss 2.0364 LearningRate 0.0080 Epoch: 14 Global Step: 178170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:37,378-Speed 3283.38 samples/sec Loss 2.1269 LearningRate 0.0080 Epoch: 14 Global Step: 178180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:40,469-Speed 3313.45 samples/sec Loss 2.0777 LearningRate 0.0080 Epoch: 14 Global Step: 178190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:43,607-Speed 3264.78 samples/sec Loss 2.0968 LearningRate 0.0080 Epoch: 14 Global Step: 178200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:10:46,714-Speed 3296.94 samples/sec Loss 2.0822 LearningRate 0.0080 Epoch: 14 Global Step: 178210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:10:49,855-Speed 3261.76 samples/sec Loss 2.0379 LearningRate 0.0080 Epoch: 14 Global Step: 178220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:10:53,008-Speed 3248.48 samples/sec Loss 2.1768 LearningRate 0.0080 Epoch: 14 Global Step: 178230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:10:56,072-Speed 3342.37 samples/sec Loss 2.1163 LearningRate 0.0080 Epoch: 14 Global Step: 178240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:10:59,185-Speed 3290.32 samples/sec Loss 2.1267 LearningRate 0.0080 Epoch: 14 Global Step: 178250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:11:02,276-Speed 3314.88 samples/sec Loss 2.0659 LearningRate 0.0080 Epoch: 14 Global Step: 178260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:11:05,389-Speed 3290.24 samples/sec Loss 2.1069 LearningRate 0.0080 Epoch: 14 Global Step: 178270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:11:08,511-Speed 3280.62 samples/sec Loss 2.1034 LearningRate 0.0080 Epoch: 14 Global Step: 178280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:11:11,632-Speed 3282.46 samples/sec Loss 2.1141 LearningRate 0.0080 Epoch: 14 Global Step: 178290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:11:14,757-Speed 3277.70 samples/sec Loss 2.0659 LearningRate 0.0080 Epoch: 14 Global Step: 178300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:11:17,847-Speed 3314.30 samples/sec Loss 2.1022 LearningRate 0.0080 Epoch: 14 Global Step: 178310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:11:20,964-Speed 3286.53 samples/sec Loss 2.0969 LearningRate 0.0080 Epoch: 14 Global Step: 178320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:11:24,099-Speed 3267.56 samples/sec Loss 2.1324 LearningRate 0.0080 Epoch: 14 Global Step: 178330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:11:27,297-Speed 3202.55 samples/sec Loss 2.0840 LearningRate 0.0080 Epoch: 14 Global Step: 178340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:11:30,363-Speed 3340.87 samples/sec Loss 2.0905 LearningRate 0.0080 Epoch: 14 Global Step: 178350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:11:33,415-Speed 3356.72 samples/sec Loss 2.1388 LearningRate 0.0080 Epoch: 14 Global Step: 178360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:11:36,595-Speed 3221.23 samples/sec Loss 2.1029 LearningRate 0.0080 Epoch: 14 Global Step: 178370 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:11:39,730-Speed 3266.88 samples/sec Loss 2.1182 LearningRate 0.0079 Epoch: 14 Global Step: 178380 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:11:42,852-Speed 3281.30 samples/sec Loss 2.1344 LearningRate 0.0079 Epoch: 14 Global Step: 178390 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:11:45,925-Speed 3332.78 samples/sec Loss 2.1438 LearningRate 0.0079 Epoch: 14 Global Step: 178400 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:11:49,071-Speed 3256.57 samples/sec Loss 2.0786 LearningRate 0.0079 Epoch: 14 Global Step: 178410 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:11:52,168-Speed 3308.01 samples/sec Loss 2.1712 LearningRate 0.0079 Epoch: 14 Global Step: 178420 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:11:55,259-Speed 3313.66 samples/sec Loss 2.1135 LearningRate 0.0079 Epoch: 14 Global Step: 178430 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:11:58,338-Speed 3326.83 samples/sec Loss 2.1888 LearningRate 0.0079 Epoch: 14 Global Step: 178440 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:12:01,433-Speed 3309.14 samples/sec Loss 2.1346 LearningRate 0.0079 Epoch: 14 Global Step: 178450 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:12:04,537-Speed 3300.27 samples/sec Loss 2.1493 LearningRate 0.0079 Epoch: 14 Global Step: 178460 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:12:07,670-Speed 3269.74 samples/sec Loss 2.1129 LearningRate 0.0079 Epoch: 14 Global Step: 178470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:12:10,745-Speed 3330.16 samples/sec Loss 2.1818 LearningRate 0.0079 Epoch: 14 Global Step: 178480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:12:13,882-Speed 3266.23 samples/sec Loss 2.1062 LearningRate 0.0079 Epoch: 14 Global Step: 178490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:12:17,018-Speed 3266.61 samples/sec Loss 2.1124 LearningRate 0.0079 Epoch: 14 Global Step: 178500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:12:20,081-Speed 3343.27 samples/sec Loss 2.0867 LearningRate 0.0079 Epoch: 14 Global Step: 178510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:12:23,165-Speed 3321.23 samples/sec Loss 2.0926 LearningRate 0.0079 Epoch: 14 Global Step: 178520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:12:26,291-Speed 3277.60 samples/sec Loss 2.1496 LearningRate 0.0079 Epoch: 14 Global Step: 178530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:12:29,368-Speed 3328.71 samples/sec Loss 2.1698 LearningRate 0.0079 Epoch: 14 Global Step: 178540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:12:32,446-Speed 3327.93 samples/sec Loss 2.1050 LearningRate 0.0079 Epoch: 14 Global Step: 178550 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:12:35,543-Speed 3306.99 samples/sec Loss 2.0798 LearningRate 0.0079 Epoch: 14 Global Step: 178560 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:12:38,653-Speed 3294.09 samples/sec Loss 2.1369 LearningRate 0.0079 Epoch: 14 Global Step: 178570 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:12:41,780-Speed 3275.07 samples/sec Loss 2.0945 LearningRate 0.0079 Epoch: 14 Global Step: 178580 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:12:44,911-Speed 3272.38 samples/sec Loss 2.1236 LearningRate 0.0079 Epoch: 14 Global Step: 178590 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:12:48,089-Speed 3223.19 samples/sec Loss 2.1243 LearningRate 0.0079 Epoch: 14 Global Step: 178600 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:12:51,269-Speed 3220.20 samples/sec Loss 2.0804 LearningRate 0.0079 Epoch: 14 Global Step: 178610 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:12:54,434-Speed 3237.08 samples/sec Loss 2.1164 LearningRate 0.0079 Epoch: 14 Global Step: 178620 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:12:57,558-Speed 3278.36 samples/sec Loss 2.1162 LearningRate 0.0079 Epoch: 14 Global Step: 178630 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:13:00,685-Speed 3275.54 samples/sec Loss 2.1754 LearningRate 0.0079 Epoch: 14 Global Step: 178640 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:13:03,839-Speed 3247.82 samples/sec Loss 2.1005 LearningRate 0.0079 Epoch: 14 Global Step: 178650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:06,945-Speed 3297.92 samples/sec Loss 2.1418 LearningRate 0.0079 Epoch: 14 Global Step: 178660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:10,032-Speed 3318.37 samples/sec Loss 2.0467 LearningRate 0.0079 Epoch: 14 Global Step: 178670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:13,123-Speed 3313.62 samples/sec Loss 2.1208 LearningRate 0.0079 Epoch: 14 Global Step: 178680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:16,193-Speed 3336.58 samples/sec Loss 2.1430 LearningRate 0.0079 Epoch: 14 Global Step: 178690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:19,280-Speed 3319.01 samples/sec Loss 2.1156 LearningRate 0.0079 Epoch: 14 Global Step: 178700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:22,390-Speed 3292.47 samples/sec Loss 2.1326 LearningRate 0.0079 Epoch: 14 Global Step: 178710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:25,538-Speed 3254.09 samples/sec Loss 2.0950 LearningRate 0.0079 Epoch: 14 Global Step: 178720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:28,676-Speed 3265.05 samples/sec Loss 2.1784 LearningRate 0.0079 Epoch: 14 Global Step: 178730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:31,823-Speed 3254.67 samples/sec Loss 2.0987 LearningRate 0.0079 Epoch: 14 Global Step: 178740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:34,978-Speed 3246.11 samples/sec Loss 2.1120 LearningRate 0.0079 Epoch: 14 Global Step: 178750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:13:38,155-Speed 3224.07 samples/sec Loss 2.1434 LearningRate 0.0079 Epoch: 14 Global Step: 178760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:13:41,248-Speed 3311.96 samples/sec Loss 2.1858 LearningRate 0.0079 Epoch: 14 Global Step: 178770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:13:44,342-Speed 3310.91 samples/sec Loss 2.1189 LearningRate 0.0079 Epoch: 14 Global Step: 178780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:13:47,486-Speed 3256.96 samples/sec Loss 2.1108 LearningRate 0.0079 Epoch: 14 Global Step: 178790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:13:50,564-Speed 3328.34 samples/sec Loss 2.1454 LearningRate 0.0079 Epoch: 14 Global Step: 178800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:13:53,671-Speed 3296.79 samples/sec Loss 2.0650 LearningRate 0.0079 Epoch: 14 Global Step: 178810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:56,766-Speed 3309.35 samples/sec Loss 2.1341 LearningRate 0.0078 Epoch: 14 Global Step: 178820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:13:59,884-Speed 3285.45 samples/sec Loss 2.1344 LearningRate 0.0078 Epoch: 14 Global Step: 178830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:03,041-Speed 3243.78 samples/sec Loss 2.0217 LearningRate 0.0078 Epoch: 14 Global Step: 178840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:06,119-Speed 3328.11 samples/sec Loss 2.1221 LearningRate 0.0078 Epoch: 14 Global Step: 178850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:09,189-Speed 3337.23 samples/sec Loss 2.2143 LearningRate 0.0078 Epoch: 14 Global Step: 178860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:12,360-Speed 3230.29 samples/sec Loss 2.0729 LearningRate 0.0078 Epoch: 14 Global Step: 178870 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:15,462-Speed 3301.05 samples/sec Loss 2.1544 LearningRate 0.0078 Epoch: 14 Global Step: 178880 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:18,557-Speed 3311.60 samples/sec Loss 2.0783 LearningRate 0.0078 Epoch: 14 Global Step: 178890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:21,639-Speed 3323.19 samples/sec Loss 2.1310 LearningRate 0.0078 Epoch: 14 Global Step: 178900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:24,711-Speed 3334.53 samples/sec Loss 2.1083 LearningRate 0.0078 Epoch: 14 Global Step: 178910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:27,823-Speed 3292.41 samples/sec Loss 2.1339 LearningRate 0.0078 Epoch: 14 Global Step: 178920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:30,929-Speed 3297.24 samples/sec Loss 2.1432 LearningRate 0.0078 Epoch: 14 Global Step: 178930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:34,005-Speed 3330.20 samples/sec Loss 2.1146 LearningRate 0.0078 Epoch: 14 Global Step: 178940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:37,097-Speed 3312.58 samples/sec Loss 2.1109 LearningRate 0.0078 Epoch: 14 Global Step: 178950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:40,186-Speed 3315.78 samples/sec Loss 2.0784 LearningRate 0.0078 Epoch: 14 Global Step: 178960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:43,296-Speed 3293.84 samples/sec Loss 2.1418 LearningRate 0.0078 Epoch: 14 Global Step: 178970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:46,404-Speed 3295.82 samples/sec Loss 2.1044 LearningRate 0.0078 Epoch: 14 Global Step: 178980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:49,539-Speed 3267.41 samples/sec Loss 2.0819 LearningRate 0.0078 Epoch: 14 Global Step: 178990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:52,682-Speed 3258.77 samples/sec Loss 2.1460 LearningRate 0.0078 Epoch: 14 Global Step: 179000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:14:55,767-Speed 3320.82 samples/sec Loss 2.1600 LearningRate 0.0078 Epoch: 14 Global Step: 179010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:14:58,872-Speed 3298.12 samples/sec Loss 2.1597 LearningRate 0.0078 Epoch: 14 Global Step: 179020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:15:02,007-Speed 3268.08 samples/sec Loss 2.1169 LearningRate 0.0078 Epoch: 14 Global Step: 179030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:15:05,098-Speed 3313.99 samples/sec Loss 2.0875 LearningRate 0.0078 Epoch: 14 Global Step: 179040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:15:08,177-Speed 3326.99 samples/sec Loss 2.0713 LearningRate 0.0078 Epoch: 14 Global Step: 179050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:11,278-Speed 3302.85 samples/sec Loss 2.1566 LearningRate 0.0078 Epoch: 14 Global Step: 179060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:14,356-Speed 3328.24 samples/sec Loss 2.1379 LearningRate 0.0078 Epoch: 14 Global Step: 179070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:17,555-Speed 3201.77 samples/sec Loss 2.0780 LearningRate 0.0078 Epoch: 14 Global Step: 179080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:20,622-Speed 3339.51 samples/sec Loss 2.0756 LearningRate 0.0078 Epoch: 14 Global Step: 179090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:23,725-Speed 3301.13 samples/sec Loss 2.1204 LearningRate 0.0078 Epoch: 14 Global Step: 179100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:26,824-Speed 3306.15 samples/sec Loss 2.1656 LearningRate 0.0078 Epoch: 14 Global Step: 179110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:29,896-Speed 3333.75 samples/sec Loss 2.1154 LearningRate 0.0078 Epoch: 14 Global Step: 179120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:32,982-Speed 3318.80 samples/sec Loss 2.1401 LearningRate 0.0078 Epoch: 14 Global Step: 179130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:36,074-Speed 3313.68 samples/sec Loss 2.1171 LearningRate 0.0078 Epoch: 14 Global Step: 179140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:39,156-Speed 3322.59 samples/sec Loss 2.1735 LearningRate 0.0078 Epoch: 14 Global Step: 179150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:42,247-Speed 3314.23 samples/sec Loss 2.0890 LearningRate 0.0078 Epoch: 14 Global Step: 179160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:45,315-Speed 3338.90 samples/sec Loss 2.1455 LearningRate 0.0078 Epoch: 14 Global Step: 179170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:48,477-Speed 3239.28 samples/sec Loss 2.1216 LearningRate 0.0078 Epoch: 14 Global Step: 179180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:51,596-Speed 3284.02 samples/sec Loss 2.1546 LearningRate 0.0078 Epoch: 14 Global Step: 179190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:54,717-Speed 3282.28 samples/sec Loss 2.1107 LearningRate 0.0078 Epoch: 14 Global Step: 179200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:15:57,812-Speed 3310.25 samples/sec Loss 2.0864 LearningRate 0.0078 Epoch: 14 Global Step: 179210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:16:00,897-Speed 3320.52 samples/sec Loss 2.1796 LearningRate 0.0078 Epoch: 14 Global Step: 179220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:16:04,072-Speed 3226.43 samples/sec Loss 2.1883 LearningRate 0.0078 Epoch: 14 Global Step: 179230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:16:07,195-Speed 3279.23 samples/sec Loss 2.1734 LearningRate 0.0078 Epoch: 14 Global Step: 179240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:16:10,279-Speed 3321.19 samples/sec Loss 2.1718 LearningRate 0.0078 Epoch: 14 Global Step: 179250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:16:13,360-Speed 3325.37 samples/sec Loss 2.1144 LearningRate 0.0078 Epoch: 14 Global Step: 179260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:16:16,444-Speed 3321.21 samples/sec Loss 2.0973 LearningRate 0.0077 Epoch: 14 Global Step: 179270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:16:19,575-Speed 3271.79 samples/sec Loss 2.1069 LearningRate 0.0077 Epoch: 14 Global Step: 179280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:16:22,670-Speed 3309.89 samples/sec Loss 2.1301 LearningRate 0.0077 Epoch: 14 Global Step: 179290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:16:25,801-Speed 3270.87 samples/sec Loss 2.1181 LearningRate 0.0077 Epoch: 14 Global Step: 179300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:16:28,897-Speed 3308.70 samples/sec Loss 2.0762 LearningRate 0.0077 Epoch: 14 Global Step: 179310 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:16:32,001-Speed 3300.68 samples/sec Loss 2.0662 LearningRate 0.0077 Epoch: 14 Global Step: 179320 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:16:35,130-Speed 3272.73 samples/sec Loss 2.1172 LearningRate 0.0077 Epoch: 14 Global Step: 179330 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:16:38,326-Speed 3205.63 samples/sec Loss 2.1054 LearningRate 0.0077 Epoch: 14 Global Step: 179340 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:16:41,523-Speed 3203.68 samples/sec Loss 2.1359 LearningRate 0.0077 Epoch: 14 Global Step: 179350 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:16:44,684-Speed 3240.06 samples/sec Loss 2.1423 LearningRate 0.0077 Epoch: 14 Global Step: 179360 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:16:47,760-Speed 3330.30 samples/sec Loss 2.1157 LearningRate 0.0077 Epoch: 14 Global Step: 179370 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:16:50,862-Speed 3301.84 samples/sec Loss 2.1570 LearningRate 0.0077 Epoch: 14 Global Step: 179380 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:16:54,071-Speed 3191.42 samples/sec Loss 2.1285 LearningRate 0.0077 Epoch: 14 Global Step: 179390 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:16:57,169-Speed 3306.53 samples/sec Loss 2.2192 LearningRate 0.0077 Epoch: 14 Global Step: 179400 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:17:00,308-Speed 3263.65 samples/sec Loss 2.1699 LearningRate 0.0077 Epoch: 14 Global Step: 179410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:03,506-Speed 3203.52 samples/sec Loss 2.0821 LearningRate 0.0077 Epoch: 14 Global Step: 179420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:06,683-Speed 3224.11 samples/sec Loss 2.2026 LearningRate 0.0077 Epoch: 14 Global Step: 179430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:09,766-Speed 3321.50 samples/sec Loss 2.1624 LearningRate 0.0077 Epoch: 14 Global Step: 179440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:12,883-Speed 3286.65 samples/sec Loss 2.0597 LearningRate 0.0077 Epoch: 14 Global Step: 179450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:15,990-Speed 3297.34 samples/sec Loss 2.0887 LearningRate 0.0077 Epoch: 14 Global Step: 179460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:19,151-Speed 3240.30 samples/sec Loss 2.1461 LearningRate 0.0077 Epoch: 14 Global Step: 179470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:22,256-Speed 3298.38 samples/sec Loss 2.1266 LearningRate 0.0077 Epoch: 14 Global Step: 179480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:25,384-Speed 3274.50 samples/sec Loss 2.1781 LearningRate 0.0077 Epoch: 14 Global Step: 179490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:28,547-Speed 3238.74 samples/sec Loss 2.0823 LearningRate 0.0077 Epoch: 14 Global Step: 179500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:31,632-Speed 3320.95 samples/sec Loss 2.1394 LearningRate 0.0077 Epoch: 14 Global Step: 179510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:17:34,709-Speed 3328.44 samples/sec Loss 2.0906 LearningRate 0.0077 Epoch: 14 Global Step: 179520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:17:37,809-Speed 3303.72 samples/sec Loss 2.1514 LearningRate 0.0077 Epoch: 14 Global Step: 179530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:40,941-Speed 3270.86 samples/sec Loss 2.1392 LearningRate 0.0077 Epoch: 14 Global Step: 179540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:44,045-Speed 3300.43 samples/sec Loss 2.1241 LearningRate 0.0077 Epoch: 14 Global Step: 179550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:47,171-Speed 3276.29 samples/sec Loss 2.1028 LearningRate 0.0077 Epoch: 14 Global Step: 179560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:50,273-Speed 3301.56 samples/sec Loss 2.1799 LearningRate 0.0077 Epoch: 14 Global Step: 179570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:53,433-Speed 3242.22 samples/sec Loss 2.1256 LearningRate 0.0077 Epoch: 14 Global Step: 179580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:56,529-Speed 3307.68 samples/sec Loss 2.0955 LearningRate 0.0077 Epoch: 14 Global Step: 179590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:17:59,694-Speed 3236.81 samples/sec Loss 2.1057 LearningRate 0.0077 Epoch: 14 Global Step: 179600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:02,816-Speed 3281.63 samples/sec Loss 2.1162 LearningRate 0.0077 Epoch: 14 Global Step: 179610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:05,937-Speed 3281.74 samples/sec Loss 2.1292 LearningRate 0.0077 Epoch: 14 Global Step: 179620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:09,006-Speed 3337.55 samples/sec Loss 2.1494 LearningRate 0.0077 Epoch: 14 Global Step: 179630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:18:12,071-Speed 3342.31 samples/sec Loss 2.1751 LearningRate 0.0077 Epoch: 14 Global Step: 179640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:18:15,163-Speed 3312.50 samples/sec Loss 2.0975 LearningRate 0.0077 Epoch: 14 Global Step: 179650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:18:18,328-Speed 3236.65 samples/sec Loss 2.1561 LearningRate 0.0077 Epoch: 14 Global Step: 179660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:18:21,402-Speed 3332.51 samples/sec Loss 2.1682 LearningRate 0.0077 Epoch: 14 Global Step: 179670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:18:24,566-Speed 3236.64 samples/sec Loss 2.1735 LearningRate 0.0077 Epoch: 14 Global Step: 179680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:27,648-Speed 3323.54 samples/sec Loss 2.1062 LearningRate 0.0077 Epoch: 14 Global Step: 179690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:30,743-Speed 3310.89 samples/sec Loss 2.1748 LearningRate 0.0077 Epoch: 14 Global Step: 179700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:33,839-Speed 3307.53 samples/sec Loss 2.1079 LearningRate 0.0077 Epoch: 14 Global Step: 179710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:36,918-Speed 3327.37 samples/sec Loss 2.1659 LearningRate 0.0076 Epoch: 14 Global Step: 179720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:40,015-Speed 3307.84 samples/sec Loss 2.1375 LearningRate 0.0076 Epoch: 14 Global Step: 179730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:43,241-Speed 3174.37 samples/sec Loss 2.0843 LearningRate 0.0076 Epoch: 14 Global Step: 179740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:46,302-Speed 3346.80 samples/sec Loss 2.1330 LearningRate 0.0076 Epoch: 14 Global Step: 179750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:49,459-Speed 3244.96 samples/sec Loss 2.1213 LearningRate 0.0076 Epoch: 14 Global Step: 179760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:52,570-Speed 3292.01 samples/sec Loss 2.1491 LearningRate 0.0076 Epoch: 14 Global Step: 179770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:18:55,644-Speed 3332.67 samples/sec Loss 2.2198 LearningRate 0.0076 Epoch: 14 Global Step: 179780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:18:58,746-Speed 3302.40 samples/sec Loss 2.1321 LearningRate 0.0076 Epoch: 14 Global Step: 179790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:19:01,860-Speed 3289.67 samples/sec Loss 2.1088 LearningRate 0.0076 Epoch: 14 Global Step: 179800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:19:04,985-Speed 3277.87 samples/sec Loss 2.1416 LearningRate 0.0076 Epoch: 14 Global Step: 179810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:19:08,035-Speed 3357.95 samples/sec Loss 2.1370 LearningRate 0.0076 Epoch: 14 Global Step: 179820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:19:11,118-Speed 3322.53 samples/sec Loss 2.1497 LearningRate 0.0076 Epoch: 14 Global Step: 179830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:19:14,237-Speed 3283.83 samples/sec Loss 2.1195 LearningRate 0.0076 Epoch: 14 Global Step: 179840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:19:17,399-Speed 3240.08 samples/sec Loss 2.1111 LearningRate 0.0076 Epoch: 14 Global Step: 179850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:19:20,502-Speed 3301.11 samples/sec Loss 2.1342 LearningRate 0.0076 Epoch: 14 Global Step: 179860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:19:23,642-Speed 3261.90 samples/sec Loss 2.1094 LearningRate 0.0076 Epoch: 14 Global Step: 179870 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:19:26,766-Speed 3278.93 samples/sec Loss 2.1382 LearningRate 0.0076 Epoch: 14 Global Step: 179880 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:19:29,908-Speed 3259.84 samples/sec Loss 2.1772 LearningRate 0.0076 Epoch: 14 Global Step: 179890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:19:33,019-Speed 3291.97 samples/sec Loss 2.1617 LearningRate 0.0076 Epoch: 14 Global Step: 179900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:19:36,135-Speed 3288.51 samples/sec Loss 2.1934 LearningRate 0.0076 Epoch: 14 Global Step: 179910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:19:39,202-Speed 3338.78 samples/sec Loss 2.1044 LearningRate 0.0076 Epoch: 14 Global Step: 179920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:19:42,301-Speed 3306.11 samples/sec Loss 2.1424 LearningRate 0.0076 Epoch: 14 Global Step: 179930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:19:45,426-Speed 3278.00 samples/sec Loss 2.1336 LearningRate 0.0076 Epoch: 14 Global Step: 179940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:19:48,545-Speed 3283.87 samples/sec Loss 2.1512 LearningRate 0.0076 Epoch: 14 Global Step: 179950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:19:51,723-Speed 3222.86 samples/sec Loss 2.2403 LearningRate 0.0076 Epoch: 14 Global Step: 179960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:19:54,883-Speed 3240.92 samples/sec Loss 2.1435 LearningRate 0.0076 Epoch: 14 Global Step: 179970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:19:57,967-Speed 3322.46 samples/sec Loss 2.1609 LearningRate 0.0076 Epoch: 14 Global Step: 179980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:01,166-Speed 3202.02 samples/sec Loss 2.1884 LearningRate 0.0076 Epoch: 14 Global Step: 179990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:04,251-Speed 3319.87 samples/sec Loss 2.2012 LearningRate 0.0076 Epoch: 14 Global Step: 180000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:07,364-Speed 3290.33 samples/sec Loss 2.1178 LearningRate 0.0076 Epoch: 14 Global Step: 180010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:10,524-Speed 3242.19 samples/sec Loss 2.1494 LearningRate 0.0076 Epoch: 14 Global Step: 180020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:13,722-Speed 3203.19 samples/sec Loss 2.1383 LearningRate 0.0076 Epoch: 14 Global Step: 180030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:16,825-Speed 3300.57 samples/sec Loss 2.2087 LearningRate 0.0076 Epoch: 14 Global Step: 180040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:19,917-Speed 3313.02 samples/sec Loss 2.1390 LearningRate 0.0076 Epoch: 14 Global Step: 180050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:22,978-Speed 3346.52 samples/sec Loss 2.1602 LearningRate 0.0076 Epoch: 14 Global Step: 180060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:26,040-Speed 3346.06 samples/sec Loss 2.0596 LearningRate 0.0076 Epoch: 14 Global Step: 180070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:29,129-Speed 3315.56 samples/sec Loss 2.1479 LearningRate 0.0076 Epoch: 14 Global Step: 180080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:32,242-Speed 3290.13 samples/sec Loss 2.0893 LearningRate 0.0076 Epoch: 14 Global Step: 180090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:35,335-Speed 3312.46 samples/sec Loss 2.1698 LearningRate 0.0076 Epoch: 14 Global Step: 180100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:38,421-Speed 3319.22 samples/sec Loss 2.1432 LearningRate 0.0076 Epoch: 14 Global Step: 180110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:20:41,601-Speed 3221.29 samples/sec Loss 2.2162 LearningRate 0.0076 Epoch: 14 Global Step: 180120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 17:20:44,661-Speed 3347.28 samples/sec Loss 2.1208 LearningRate 0.0076 Epoch: 14 Global Step: 180130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:20:47,738-Speed 3329.53 samples/sec Loss 2.0935 LearningRate 0.0076 Epoch: 14 Global Step: 180140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:20:50,824-Speed 3319.10 samples/sec Loss 2.0910 LearningRate 0.0076 Epoch: 14 Global Step: 180150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:20:53,991-Speed 3234.51 samples/sec Loss 2.1610 LearningRate 0.0076 Epoch: 14 Global Step: 180160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:20:57,045-Speed 3353.77 samples/sec Loss 2.2226 LearningRate 0.0075 Epoch: 14 Global Step: 180170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:00,145-Speed 3304.53 samples/sec Loss 2.1410 LearningRate 0.0075 Epoch: 14 Global Step: 180180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:03,219-Speed 3331.69 samples/sec Loss 2.1987 LearningRate 0.0075 Epoch: 14 Global Step: 180190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:06,356-Speed 3266.46 samples/sec Loss 2.1103 LearningRate 0.0075 Epoch: 14 Global Step: 180200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:09,416-Speed 3347.59 samples/sec Loss 2.1746 LearningRate 0.0075 Epoch: 14 Global Step: 180210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:12,502-Speed 3318.69 samples/sec Loss 2.1416 LearningRate 0.0075 Epoch: 14 Global Step: 180220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:15,625-Speed 3280.47 samples/sec Loss 2.2302 LearningRate 0.0075 Epoch: 14 Global Step: 180230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:21:18,817-Speed 3208.38 samples/sec Loss 2.1881 LearningRate 0.0075 Epoch: 14 Global Step: 180240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:21:21,913-Speed 3308.98 samples/sec Loss 2.1530 LearningRate 0.0075 Epoch: 14 Global Step: 180250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:21:25,011-Speed 3306.13 samples/sec Loss 2.1550 LearningRate 0.0075 Epoch: 14 Global Step: 180260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:28,152-Speed 3261.11 samples/sec Loss 2.1831 LearningRate 0.0075 Epoch: 14 Global Step: 180270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:31,241-Speed 3315.72 samples/sec Loss 2.0740 LearningRate 0.0075 Epoch: 14 Global Step: 180280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:34,296-Speed 3353.62 samples/sec Loss 2.1628 LearningRate 0.0075 Epoch: 14 Global Step: 180290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:37,428-Speed 3270.40 samples/sec Loss 2.1502 LearningRate 0.0075 Epoch: 14 Global Step: 180300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:40,545-Speed 3286.06 samples/sec Loss 2.2136 LearningRate 0.0075 Epoch: 14 Global Step: 180310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:43,688-Speed 3259.30 samples/sec Loss 2.1614 LearningRate 0.0075 Epoch: 14 Global Step: 180320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:46,780-Speed 3313.30 samples/sec Loss 2.1799 LearningRate 0.0075 Epoch: 14 Global Step: 180330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:49,849-Speed 3337.69 samples/sec Loss 2.2503 LearningRate 0.0075 Epoch: 14 Global Step: 180340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:52,971-Speed 3280.75 samples/sec Loss 2.2116 LearningRate 0.0075 Epoch: 14 Global Step: 180350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:21:56,062-Speed 3314.28 samples/sec Loss 2.1496 LearningRate 0.0075 Epoch: 14 Global Step: 180360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:21:59,124-Speed 3344.54 samples/sec Loss 2.1994 LearningRate 0.0075 Epoch: 14 Global Step: 180370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:22:02,209-Speed 3319.97 samples/sec Loss 2.0857 LearningRate 0.0075 Epoch: 14 Global Step: 180380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:22:05,350-Speed 3261.72 samples/sec Loss 2.1768 LearningRate 0.0075 Epoch: 14 Global Step: 180390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:22:08,413-Speed 3344.71 samples/sec Loss 2.1858 LearningRate 0.0075 Epoch: 14 Global Step: 180400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:22:11,472-Speed 3348.62 samples/sec Loss 2.1630 LearningRate 0.0075 Epoch: 14 Global Step: 180410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:22:14,540-Speed 3338.17 samples/sec Loss 2.1070 LearningRate 0.0075 Epoch: 14 Global Step: 180420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:22:17,689-Speed 3253.45 samples/sec Loss 2.1425 LearningRate 0.0075 Epoch: 14 Global Step: 180430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:22:20,762-Speed 3333.17 samples/sec Loss 2.2116 LearningRate 0.0075 Epoch: 14 Global Step: 180440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:22:23,819-Speed 3350.68 samples/sec Loss 2.1689 LearningRate 0.0075 Epoch: 14 Global Step: 180450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:22:26,900-Speed 3325.18 samples/sec Loss 2.0964 LearningRate 0.0075 Epoch: 14 Global Step: 180460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:22:30,012-Speed 3290.89 samples/sec Loss 2.1366 LearningRate 0.0075 Epoch: 14 Global Step: 180470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:22:33,084-Speed 3334.61 samples/sec Loss 2.1095 LearningRate 0.0075 Epoch: 14 Global Step: 180480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:22:36,237-Speed 3248.50 samples/sec Loss 2.2039 LearningRate 0.0075 Epoch: 14 Global Step: 180490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:22:39,372-Speed 3268.05 samples/sec Loss 2.1576 LearningRate 0.0075 Epoch: 14 Global Step: 180500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:22:42,448-Speed 3329.88 samples/sec Loss 2.1125 LearningRate 0.0075 Epoch: 14 Global Step: 180510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:22:45,517-Speed 3338.31 samples/sec Loss 2.1640 LearningRate 0.0075 Epoch: 14 Global Step: 180520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:22:48,659-Speed 3259.71 samples/sec Loss 2.1340 LearningRate 0.0075 Epoch: 14 Global Step: 180530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:22:51,828-Speed 3232.51 samples/sec Loss 2.1208 LearningRate 0.0075 Epoch: 14 Global Step: 180540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:22:54,975-Speed 3254.49 samples/sec Loss 2.1252 LearningRate 0.0075 Epoch: 14 Global Step: 180550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:22:58,028-Speed 3355.95 samples/sec Loss 2.1688 LearningRate 0.0075 Epoch: 14 Global Step: 180560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:01,116-Speed 3316.10 samples/sec Loss 2.0849 LearningRate 0.0075 Epoch: 14 Global Step: 180570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:04,284-Speed 3233.57 samples/sec Loss 2.0987 LearningRate 0.0075 Epoch: 14 Global Step: 180580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:23:07,374-Speed 3315.09 samples/sec Loss 2.1983 LearningRate 0.0075 Epoch: 14 Global Step: 180590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:23:10,468-Speed 3310.92 samples/sec Loss 2.0801 LearningRate 0.0075 Epoch: 14 Global Step: 180600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:23:13,657-Speed 3211.32 samples/sec Loss 2.2334 LearningRate 0.0075 Epoch: 14 Global Step: 180610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:23:16,776-Speed 3284.12 samples/sec Loss 2.1451 LearningRate 0.0074 Epoch: 14 Global Step: 180620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:23:19,833-Speed 3351.36 samples/sec Loss 2.1693 LearningRate 0.0074 Epoch: 14 Global Step: 180630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:23:22,900-Speed 3340.40 samples/sec Loss 2.1949 LearningRate 0.0074 Epoch: 14 Global Step: 180640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:26,064-Speed 3237.33 samples/sec Loss 2.1204 LearningRate 0.0074 Epoch: 14 Global Step: 180650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:29,208-Speed 3258.04 samples/sec Loss 2.2553 LearningRate 0.0074 Epoch: 14 Global Step: 180660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:32,306-Speed 3305.91 samples/sec Loss 2.1249 LearningRate 0.0074 Epoch: 14 Global Step: 180670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:35,396-Speed 3315.72 samples/sec Loss 2.1872 LearningRate 0.0074 Epoch: 14 Global Step: 180680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:38,478-Speed 3323.50 samples/sec Loss 2.2221 LearningRate 0.0074 Epoch: 14 Global Step: 180690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:41,590-Speed 3291.19 samples/sec Loss 2.1473 LearningRate 0.0074 Epoch: 14 Global Step: 180700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:44,679-Speed 3316.25 samples/sec Loss 2.1488 LearningRate 0.0074 Epoch: 14 Global Step: 180710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:47,910-Speed 3170.26 samples/sec Loss 2.1052 LearningRate 0.0074 Epoch: 14 Global Step: 180720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:50,997-Speed 3318.00 samples/sec Loss 2.1627 LearningRate 0.0074 Epoch: 14 Global Step: 180730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:23:54,166-Speed 3231.72 samples/sec Loss 2.1411 LearningRate 0.0074 Epoch: 14 Global Step: 180740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:23:57,261-Speed 3309.97 samples/sec Loss 2.1293 LearningRate 0.0074 Epoch: 14 Global Step: 180750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:24:00,371-Speed 3293.82 samples/sec Loss 2.1084 LearningRate 0.0074 Epoch: 14 Global Step: 180760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:24:03,519-Speed 3253.93 samples/sec Loss 2.2010 LearningRate 0.0074 Epoch: 14 Global Step: 180770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:24:06,631-Speed 3291.90 samples/sec Loss 2.1706 LearningRate 0.0074 Epoch: 14 Global Step: 180780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:24:09,714-Speed 3322.39 samples/sec Loss 2.1890 LearningRate 0.0074 Epoch: 14 Global Step: 180790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:24:12,822-Speed 3295.41 samples/sec Loss 2.1958 LearningRate 0.0074 Epoch: 14 Global Step: 180800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:24:15,930-Speed 3296.68 samples/sec Loss 2.2049 LearningRate 0.0074 Epoch: 14 Global Step: 180810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:24:19,008-Speed 3326.96 samples/sec Loss 2.1693 LearningRate 0.0074 Epoch: 14 Global Step: 180820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:24:22,108-Speed 3304.60 samples/sec Loss 2.2091 LearningRate 0.0074 Epoch: 14 Global Step: 180830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:24:25,175-Speed 3339.59 samples/sec Loss 2.1844 LearningRate 0.0074 Epoch: 14 Global Step: 180840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:24:28,329-Speed 3248.69 samples/sec Loss 2.1661 LearningRate 0.0074 Epoch: 14 Global Step: 180850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:24:31,465-Speed 3265.90 samples/sec Loss 2.1828 LearningRate 0.0074 Epoch: 14 Global Step: 180860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:24:34,560-Speed 3309.10 samples/sec Loss 2.1983 LearningRate 0.0074 Epoch: 14 Global Step: 180870 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:24:37,701-Speed 3260.95 samples/sec Loss 2.1109 LearningRate 0.0074 Epoch: 14 Global Step: 180880 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:24:40,800-Speed 3305.49 samples/sec Loss 2.0977 LearningRate 0.0074 Epoch: 14 Global Step: 180890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:24:43,866-Speed 3340.48 samples/sec Loss 2.1225 LearningRate 0.0074 Epoch: 14 Global Step: 180900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:24:46,984-Speed 3286.08 samples/sec Loss 2.1222 LearningRate 0.0074 Epoch: 14 Global Step: 180910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:24:50,077-Speed 3311.00 samples/sec Loss 2.2213 LearningRate 0.0074 Epoch: 14 Global Step: 180920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:24:53,157-Speed 3325.80 samples/sec Loss 2.1351 LearningRate 0.0074 Epoch: 14 Global Step: 180930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:24:56,268-Speed 3292.82 samples/sec Loss 2.1402 LearningRate 0.0074 Epoch: 14 Global Step: 180940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:24:59,413-Speed 3257.13 samples/sec Loss 2.1139 LearningRate 0.0074 Epoch: 14 Global Step: 180950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:25:02,519-Speed 3297.91 samples/sec Loss 2.1744 LearningRate 0.0074 Epoch: 14 Global Step: 180960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:05,638-Speed 3283.71 samples/sec Loss 2.1500 LearningRate 0.0074 Epoch: 14 Global Step: 180970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:08,769-Speed 3271.66 samples/sec Loss 2.1616 LearningRate 0.0074 Epoch: 14 Global Step: 180980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:11,863-Speed 3310.66 samples/sec Loss 2.1280 LearningRate 0.0074 Epoch: 14 Global Step: 180990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:15,034-Speed 3230.73 samples/sec Loss 2.1280 LearningRate 0.0074 Epoch: 14 Global Step: 181000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:18,233-Speed 3201.19 samples/sec Loss 2.0821 LearningRate 0.0074 Epoch: 14 Global Step: 181010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:21,311-Speed 3328.75 samples/sec Loss 2.1298 LearningRate 0.0074 Epoch: 14 Global Step: 181020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:24,437-Speed 3276.06 samples/sec Loss 2.1269 LearningRate 0.0074 Epoch: 14 Global Step: 181030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:27,576-Speed 3263.87 samples/sec Loss 2.1877 LearningRate 0.0074 Epoch: 14 Global Step: 181040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:30,701-Speed 3277.29 samples/sec Loss 2.1572 LearningRate 0.0074 Epoch: 14 Global Step: 181050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:33,779-Speed 3328.76 samples/sec Loss 2.2047 LearningRate 0.0074 Epoch: 14 Global Step: 181060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:36,886-Speed 3296.35 samples/sec Loss 2.1719 LearningRate 0.0074 Epoch: 14 Global Step: 181070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:39,962-Speed 3329.61 samples/sec Loss 2.1622 LearningRate 0.0073 Epoch: 14 Global Step: 181080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:43,077-Speed 3288.12 samples/sec Loss 2.1983 LearningRate 0.0073 Epoch: 14 Global Step: 181090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:46,186-Speed 3295.71 samples/sec Loss 2.2250 LearningRate 0.0073 Epoch: 14 Global Step: 181100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:49,287-Speed 3302.76 samples/sec Loss 2.1425 LearningRate 0.0073 Epoch: 14 Global Step: 181110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:52,393-Speed 3297.92 samples/sec Loss 2.1339 LearningRate 0.0073 Epoch: 14 Global Step: 181120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:55,509-Speed 3286.82 samples/sec Loss 2.2001 LearningRate 0.0073 Epoch: 14 Global Step: 181130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:25:58,616-Speed 3296.91 samples/sec Loss 2.1809 LearningRate 0.0073 Epoch: 14 Global Step: 181140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:26:01,703-Speed 3318.36 samples/sec Loss 2.1512 LearningRate 0.0073 Epoch: 14 Global Step: 181150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:26:04,823-Speed 3283.17 samples/sec Loss 2.1840 LearningRate 0.0073 Epoch: 14 Global Step: 181160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:26:07,917-Speed 3310.74 samples/sec Loss 2.1103 LearningRate 0.0073 Epoch: 14 Global Step: 181170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:26:10,988-Speed 3335.56 samples/sec Loss 2.1907 LearningRate 0.0073 Epoch: 14 Global Step: 181180 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:26:14,111-Speed 3279.90 samples/sec Loss 2.1872 LearningRate 0.0073 Epoch: 14 Global Step: 181190 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:26:17,201-Speed 3315.35 samples/sec Loss 2.2030 LearningRate 0.0073 Epoch: 14 Global Step: 181200 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:26:20,284-Speed 3322.89 samples/sec Loss 2.1549 LearningRate 0.0073 Epoch: 14 Global Step: 181210 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:26:23,413-Speed 3273.13 samples/sec Loss 2.1467 LearningRate 0.0073 Epoch: 14 Global Step: 181220 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:26:26,515-Speed 3301.94 samples/sec Loss 2.2068 LearningRate 0.0073 Epoch: 14 Global Step: 181230 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:26:29,594-Speed 3327.07 samples/sec Loss 2.1572 LearningRate 0.0073 Epoch: 14 Global Step: 181240 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:26:32,695-Speed 3303.52 samples/sec Loss 2.1356 LearningRate 0.0073 Epoch: 14 Global Step: 181250 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:26:35,812-Speed 3285.94 samples/sec Loss 2.1820 LearningRate 0.0073 Epoch: 14 Global Step: 181260 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:26:38,942-Speed 3272.38 samples/sec Loss 2.1988 LearningRate 0.0073 Epoch: 14 Global Step: 181270 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:26:42,040-Speed 3306.00 samples/sec Loss 2.1802 LearningRate 0.0073 Epoch: 14 Global Step: 181280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:26:45,129-Speed 3316.89 samples/sec Loss 2.2771 LearningRate 0.0073 Epoch: 14 Global Step: 181290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:26:48,222-Speed 3310.68 samples/sec Loss 2.1288 LearningRate 0.0073 Epoch: 14 Global Step: 181300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:26:51,327-Speed 3298.87 samples/sec Loss 2.1731 LearningRate 0.0073 Epoch: 14 Global Step: 181310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:26:54,424-Speed 3307.68 samples/sec Loss 2.1060 LearningRate 0.0073 Epoch: 14 Global Step: 181320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:26:57,503-Speed 3327.63 samples/sec Loss 2.1744 LearningRate 0.0073 Epoch: 14 Global Step: 181330 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:27:00,684-Speed 3219.18 samples/sec Loss 2.1374 LearningRate 0.0073 Epoch: 14 Global Step: 181340 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:27:03,800-Speed 3287.42 samples/sec Loss 2.2215 LearningRate 0.0073 Epoch: 14 Global Step: 181350 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:27:06,916-Speed 3287.36 samples/sec Loss 2.1934 LearningRate 0.0073 Epoch: 14 Global Step: 181360 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:27:09,987-Speed 3335.56 samples/sec Loss 2.1585 LearningRate 0.0073 Epoch: 14 Global Step: 181370 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:27:13,111-Speed 3278.68 samples/sec Loss 2.1445 LearningRate 0.0073 Epoch: 14 Global Step: 181380 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:27:16,202-Speed 3313.92 samples/sec Loss 2.1382 LearningRate 0.0073 Epoch: 14 Global Step: 181390 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:27:19,284-Speed 3323.91 samples/sec Loss 2.1607 LearningRate 0.0073 Epoch: 14 Global Step: 181400 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:27:22,390-Speed 3297.55 samples/sec Loss 2.1472 LearningRate 0.0073 Epoch: 14 Global Step: 181410 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:27:25,516-Speed 3277.87 samples/sec Loss 2.2065 LearningRate 0.0073 Epoch: 14 Global Step: 181420 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:27:28,657-Speed 3260.93 samples/sec Loss 2.1519 LearningRate 0.0073 Epoch: 14 Global Step: 181430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:27:31,800-Speed 3259.22 samples/sec Loss 2.2122 LearningRate 0.0073 Epoch: 14 Global Step: 181440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:27:34,938-Speed 3263.90 samples/sec Loss 2.1758 LearningRate 0.0073 Epoch: 14 Global Step: 181450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:27:38,115-Speed 3224.37 samples/sec Loss 2.1852 LearningRate 0.0073 Epoch: 14 Global Step: 181460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:27:41,310-Speed 3206.71 samples/sec Loss 2.1909 LearningRate 0.0073 Epoch: 14 Global Step: 181470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:27:44,414-Speed 3299.96 samples/sec Loss 2.1391 LearningRate 0.0073 Epoch: 14 Global Step: 181480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:27:47,572-Speed 3243.60 samples/sec Loss 2.2388 LearningRate 0.0073 Epoch: 14 Global Step: 181490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:27:50,673-Speed 3303.00 samples/sec Loss 2.1318 LearningRate 0.0073 Epoch: 14 Global Step: 181500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:27:53,773-Speed 3304.08 samples/sec Loss 2.1218 LearningRate 0.0073 Epoch: 14 Global Step: 181510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:27:56,871-Speed 3306.99 samples/sec Loss 2.1813 LearningRate 0.0073 Epoch: 14 Global Step: 181520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:27:59,976-Speed 3297.72 samples/sec Loss 2.1752 LearningRate 0.0073 Epoch: 14 Global Step: 181530 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:28:03,067-Speed 3314.62 samples/sec Loss 2.1594 LearningRate 0.0072 Epoch: 14 Global Step: 181540 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:28:06,140-Speed 3332.93 samples/sec Loss 2.1564 LearningRate 0.0072 Epoch: 14 Global Step: 181550 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:28:09,229-Speed 3316.45 samples/sec Loss 2.1102 LearningRate 0.0072 Epoch: 14 Global Step: 181560 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:28:12,371-Speed 3259.29 samples/sec Loss 2.1976 LearningRate 0.0072 Epoch: 14 Global Step: 181570 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:28:15,497-Speed 3277.12 samples/sec Loss 2.1034 LearningRate 0.0072 Epoch: 14 Global Step: 181580 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:28:18,592-Speed 3309.54 samples/sec Loss 2.2115 LearningRate 0.0072 Epoch: 14 Global Step: 181590 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:28:21,659-Speed 3339.86 samples/sec Loss 2.1351 LearningRate 0.0072 Epoch: 14 Global Step: 181600 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:28:24,809-Speed 3251.75 samples/sec Loss 2.1750 LearningRate 0.0072 Epoch: 14 Global Step: 181610 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:28:27,936-Speed 3275.56 samples/sec Loss 2.1848 LearningRate 0.0072 Epoch: 14 Global Step: 181620 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:28:31,035-Speed 3304.95 samples/sec Loss 2.1694 LearningRate 0.0072 Epoch: 14 Global Step: 181630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:28:34,169-Speed 3268.55 samples/sec Loss 2.1389 LearningRate 0.0072 Epoch: 14 Global Step: 181640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:28:37,284-Speed 3288.83 samples/sec Loss 2.1850 LearningRate 0.0072 Epoch: 14 Global Step: 181650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:28:40,383-Speed 3305.51 samples/sec Loss 2.1607 LearningRate 0.0072 Epoch: 14 Global Step: 181660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:28:43,487-Speed 3299.17 samples/sec Loss 2.2094 LearningRate 0.0072 Epoch: 14 Global Step: 181670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:28:46,589-Speed 3302.28 samples/sec Loss 2.1772 LearningRate 0.0072 Epoch: 14 Global Step: 181680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:28:49,694-Speed 3299.33 samples/sec Loss 2.1891 LearningRate 0.0072 Epoch: 14 Global Step: 181690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:28:52,811-Speed 3286.71 samples/sec Loss 2.1369 LearningRate 0.0072 Epoch: 14 Global Step: 181700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:28:55,891-Speed 3325.99 samples/sec Loss 2.0924 LearningRate 0.0072 Epoch: 14 Global Step: 181710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:28:58,967-Speed 3330.10 samples/sec Loss 2.1420 LearningRate 0.0072 Epoch: 14 Global Step: 181720 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:29:02,096-Speed 3273.93 samples/sec Loss 2.1772 LearningRate 0.0072 Epoch: 14 Global Step: 181730 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:29:05,236-Speed 3262.00 samples/sec Loss 2.1834 LearningRate 0.0072 Epoch: 14 Global Step: 181740 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:29:08,313-Speed 3328.45 samples/sec Loss 2.2038 LearningRate 0.0072 Epoch: 14 Global Step: 181750 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:29:11,423-Speed 3294.51 samples/sec Loss 2.2076 LearningRate 0.0072 Epoch: 14 Global Step: 181760 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:29:14,502-Speed 3326.15 samples/sec Loss 2.2046 LearningRate 0.0072 Epoch: 14 Global Step: 181770 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:29:18,203-Speed 2767.65 samples/sec Loss 2.1769 LearningRate 0.0072 Epoch: 14 Global Step: 181780 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:29:21,260-Speed 3350.17 samples/sec Loss 2.1930 LearningRate 0.0072 Epoch: 14 Global Step: 181790 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:29:24,313-Speed 3355.39 samples/sec Loss 2.1837 LearningRate 0.0072 Epoch: 14 Global Step: 181800 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:29:27,364-Speed 3357.29 samples/sec Loss 2.1396 LearningRate 0.0072 Epoch: 14 Global Step: 181810 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:29:30,408-Speed 3365.44 samples/sec Loss 2.2140 LearningRate 0.0072 Epoch: 14 Global Step: 181820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:29:33,525-Speed 3286.59 samples/sec Loss 2.2239 LearningRate 0.0072 Epoch: 14 Global Step: 181830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:29:36,594-Speed 3337.84 samples/sec Loss 2.2225 LearningRate 0.0072 Epoch: 14 Global Step: 181840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:29:39,743-Speed 3252.38 samples/sec Loss 2.1915 LearningRate 0.0072 Epoch: 14 Global Step: 181850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:29:42,810-Speed 3340.02 samples/sec Loss 2.1764 LearningRate 0.0072 Epoch: 14 Global Step: 181860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:29:45,921-Speed 3292.38 samples/sec Loss 2.1878 LearningRate 0.0072 Epoch: 14 Global Step: 181870 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:29:49,002-Speed 3324.62 samples/sec Loss 2.1816 LearningRate 0.0072 Epoch: 14 Global Step: 181880 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:29:52,123-Speed 3281.49 samples/sec Loss 2.1940 LearningRate 0.0072 Epoch: 14 Global Step: 181890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:29:55,224-Speed 3303.73 samples/sec Loss 2.1882 LearningRate 0.0072 Epoch: 14 Global Step: 181900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:29:58,259-Speed 3375.67 samples/sec Loss 2.1654 LearningRate 0.0072 Epoch: 14 Global Step: 181910 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:30:01,384-Speed 3277.34 samples/sec Loss 2.1334 LearningRate 0.0072 Epoch: 14 Global Step: 181920 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:30:04,572-Speed 3213.00 samples/sec Loss 2.2433 LearningRate 0.0072 Epoch: 14 Global Step: 181930 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:30:07,706-Speed 3269.44 samples/sec Loss 2.2156 LearningRate 0.0072 Epoch: 14 Global Step: 181940 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:30:10,795-Speed 3315.88 samples/sec Loss 2.1428 LearningRate 0.0072 Epoch: 14 Global Step: 181950 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:30:13,900-Speed 3298.10 samples/sec Loss 2.2650 LearningRate 0.0072 Epoch: 14 Global Step: 181960 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:30:17,034-Speed 3268.38 samples/sec Loss 2.1518 LearningRate 0.0072 Epoch: 14 Global Step: 181970 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:30:20,168-Speed 3268.85 samples/sec Loss 2.0888 LearningRate 0.0072 Epoch: 14 Global Step: 181980 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:30:23,271-Speed 3301.00 samples/sec Loss 2.2273 LearningRate 0.0072 Epoch: 14 Global Step: 181990 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:30:26,365-Speed 3311.18 samples/sec Loss 2.1505 LearningRate 0.0071 Epoch: 14 Global Step: 182000 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:30:29,529-Speed 3236.86 samples/sec Loss 2.2405 LearningRate 0.0071 Epoch: 14 Global Step: 182010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:30:32,599-Speed 3336.81 samples/sec Loss 2.1607 LearningRate 0.0071 Epoch: 14 Global Step: 182020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:30:35,750-Speed 3250.74 samples/sec Loss 2.2322 LearningRate 0.0071 Epoch: 14 Global Step: 182030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:30:38,891-Speed 3261.81 samples/sec Loss 2.2004 LearningRate 0.0071 Epoch: 14 Global Step: 182040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:30:41,981-Speed 3315.01 samples/sec Loss 2.1597 LearningRate 0.0071 Epoch: 14 Global Step: 182050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:30:45,038-Speed 3350.95 samples/sec Loss 2.1799 LearningRate 0.0071 Epoch: 14 Global Step: 182060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:30:48,102-Speed 3342.55 samples/sec Loss 2.2367 LearningRate 0.0071 Epoch: 14 Global Step: 182070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:30:51,162-Speed 3347.33 samples/sec Loss 2.1527 LearningRate 0.0071 Epoch: 14 Global Step: 182080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:30:54,860-Speed 2770.13 samples/sec Loss 2.2048 LearningRate 0.0071 Epoch: 14 Global Step: 182090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:30:58,542-Speed 2781.99 samples/sec Loss 2.1551 LearningRate 0.0071 Epoch: 14 Global Step: 182100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:31:02,965-Speed 2315.91 samples/sec Loss 2.1398 LearningRate 0.0071 Epoch: 14 Global Step: 182110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:31:06,628-Speed 2796.11 samples/sec Loss 2.1597 LearningRate 0.0071 Epoch: 14 Global Step: 182120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:31:09,676-Speed 3361.13 samples/sec Loss 2.1182 LearningRate 0.0071 Epoch: 14 Global Step: 182130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:31:12,773-Speed 3307.08 samples/sec Loss 2.1950 LearningRate 0.0071 Epoch: 14 Global Step: 182140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:31:15,858-Speed 3320.47 samples/sec Loss 2.2522 LearningRate 0.0071 Epoch: 14 Global Step: 182150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:31:18,954-Speed 3308.07 samples/sec Loss 2.1165 LearningRate 0.0071 Epoch: 14 Global Step: 182160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:31:22,025-Speed 3336.32 samples/sec Loss 2.1161 LearningRate 0.0071 Epoch: 14 Global Step: 182170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:31:25,179-Speed 3247.52 samples/sec Loss 2.1570 LearningRate 0.0071 Epoch: 14 Global Step: 182180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:31:28,287-Speed 3296.25 samples/sec Loss 2.1512 LearningRate 0.0071 Epoch: 14 Global Step: 182190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:31:31,406-Speed 3283.71 samples/sec Loss 2.1837 LearningRate 0.0071 Epoch: 14 Global Step: 182200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:31:34,491-Speed 3320.86 samples/sec Loss 2.1411 LearningRate 0.0071 Epoch: 14 Global Step: 182210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:31:37,560-Speed 3337.13 samples/sec Loss 2.1843 LearningRate 0.0071 Epoch: 14 Global Step: 182220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:31:40,711-Speed 3250.37 samples/sec Loss 2.2535 LearningRate 0.0071 Epoch: 14 Global Step: 182230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:31:43,819-Speed 3296.30 samples/sec Loss 2.1472 LearningRate 0.0071 Epoch: 14 Global Step: 182240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:31:46,915-Speed 3308.21 samples/sec Loss 2.1626 LearningRate 0.0071 Epoch: 14 Global Step: 182250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:31:50,024-Speed 3295.61 samples/sec Loss 2.2029 LearningRate 0.0071 Epoch: 14 Global Step: 182260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:31:53,173-Speed 3252.42 samples/sec Loss 2.2018 LearningRate 0.0071 Epoch: 14 Global Step: 182270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:31:56,288-Speed 3287.98 samples/sec Loss 2.2250 LearningRate 0.0071 Epoch: 14 Global Step: 182280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:31:59,426-Speed 3264.37 samples/sec Loss 2.1381 LearningRate 0.0071 Epoch: 14 Global Step: 182290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:32:02,500-Speed 3332.37 samples/sec Loss 2.1617 LearningRate 0.0071 Epoch: 14 Global Step: 182300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:32:05,571-Speed 3336.19 samples/sec Loss 2.1564 LearningRate 0.0071 Epoch: 14 Global Step: 182310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:32:08,706-Speed 3267.51 samples/sec Loss 2.1481 LearningRate 0.0071 Epoch: 14 Global Step: 182320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:32:11,827-Speed 3281.64 samples/sec Loss 2.1339 LearningRate 0.0071 Epoch: 14 Global Step: 182330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:32:14,907-Speed 3325.34 samples/sec Loss 2.1653 LearningRate 0.0071 Epoch: 14 Global Step: 182340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:32:18,090-Speed 3218.18 samples/sec Loss 2.2244 LearningRate 0.0071 Epoch: 14 Global Step: 182350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:32:21,126-Speed 3374.11 samples/sec Loss 2.1324 LearningRate 0.0071 Epoch: 14 Global Step: 182360 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:32:24,213-Speed 3318.11 samples/sec Loss 2.1505 LearningRate 0.0071 Epoch: 14 Global Step: 182370 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:32:27,358-Speed 3256.69 samples/sec Loss 2.2207 LearningRate 0.0071 Epoch: 14 Global Step: 182380 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:32:30,495-Speed 3265.37 samples/sec Loss 2.1875 LearningRate 0.0071 Epoch: 14 Global Step: 182390 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:32:33,551-Speed 3352.50 samples/sec Loss 2.2281 LearningRate 0.0071 Epoch: 14 Global Step: 182400 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:32:36,685-Speed 3268.74 samples/sec Loss 2.1771 LearningRate 0.0071 Epoch: 14 Global Step: 182410 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:32:39,757-Speed 3334.69 samples/sec Loss 2.1854 LearningRate 0.0071 Epoch: 14 Global Step: 182420 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:32:42,942-Speed 3216.20 samples/sec Loss 2.1816 LearningRate 0.0071 Epoch: 14 Global Step: 182430 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:32:46,027-Speed 3320.33 samples/sec Loss 2.2089 LearningRate 0.0071 Epoch: 14 Global Step: 182440 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:32:49,171-Speed 3257.70 samples/sec Loss 2.1885 LearningRate 0.0071 Epoch: 14 Global Step: 182450 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:32:52,253-Speed 3323.37 samples/sec Loss 2.1397 LearningRate 0.0070 Epoch: 14 Global Step: 182460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:32:55,337-Speed 3321.79 samples/sec Loss 2.1833 LearningRate 0.0070 Epoch: 14 Global Step: 182470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:32:58,424-Speed 3317.95 samples/sec Loss 2.2077 LearningRate 0.0070 Epoch: 14 Global Step: 182480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:33:01,589-Speed 3236.67 samples/sec Loss 2.2045 LearningRate 0.0070 Epoch: 14 Global Step: 182490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:33:04,711-Speed 3280.43 samples/sec Loss 2.1841 LearningRate 0.0070 Epoch: 14 Global Step: 182500 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:33:07,812-Speed 3303.70 samples/sec Loss 2.1512 LearningRate 0.0070 Epoch: 14 Global Step: 182510 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:33:10,892-Speed 3325.30 samples/sec Loss 2.2619 LearningRate 0.0070 Epoch: 14 Global Step: 182520 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:33:14,004-Speed 3291.07 samples/sec Loss 2.1597 LearningRate 0.0070 Epoch: 14 Global Step: 182530 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:33:17,088-Speed 3322.38 samples/sec Loss 2.2296 LearningRate 0.0070 Epoch: 14 Global Step: 182540 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:33:20,197-Speed 3294.52 samples/sec Loss 2.2626 LearningRate 0.0070 Epoch: 14 Global Step: 182550 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:33:23,261-Speed 3343.28 samples/sec Loss 2.1833 LearningRate 0.0070 Epoch: 14 Global Step: 182560 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:33:26,356-Speed 3309.39 samples/sec Loss 2.1731 LearningRate 0.0070 Epoch: 14 Global Step: 182570 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:33:29,418-Speed 3345.69 samples/sec Loss 2.2198 LearningRate 0.0070 Epoch: 14 Global Step: 182580 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:33:32,528-Speed 3293.29 samples/sec Loss 2.1762 LearningRate 0.0070 Epoch: 14 Global Step: 182590 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:33:35,608-Speed 3325.70 samples/sec Loss 2.2233 LearningRate 0.0070 Epoch: 14 Global Step: 182600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:33:38,771-Speed 3238.74 samples/sec Loss 2.2430 LearningRate 0.0070 Epoch: 14 Global Step: 182610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:33:41,870-Speed 3304.78 samples/sec Loss 2.1677 LearningRate 0.0070 Epoch: 14 Global Step: 182620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:33:44,953-Speed 3322.37 samples/sec Loss 2.1782 LearningRate 0.0070 Epoch: 14 Global Step: 182630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:33:48,087-Speed 3268.82 samples/sec Loss 2.2018 LearningRate 0.0070 Epoch: 14 Global Step: 182640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:33:51,174-Speed 3318.47 samples/sec Loss 2.2018 LearningRate 0.0070 Epoch: 14 Global Step: 182650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:33:54,290-Speed 3286.62 samples/sec Loss 2.1766 LearningRate 0.0070 Epoch: 14 Global Step: 182660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:33:57,359-Speed 3337.83 samples/sec Loss 2.1822 LearningRate 0.0070 Epoch: 14 Global Step: 182670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:00,484-Speed 3278.39 samples/sec Loss 2.1693 LearningRate 0.0070 Epoch: 14 Global Step: 182680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:03,576-Speed 3312.95 samples/sec Loss 2.2575 LearningRate 0.0070 Epoch: 14 Global Step: 182690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:06,627-Speed 3357.23 samples/sec Loss 2.1605 LearningRate 0.0070 Epoch: 14 Global Step: 182700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:34:09,688-Speed 3345.78 samples/sec Loss 2.1482 LearningRate 0.0070 Epoch: 14 Global Step: 182710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:34:12,855-Speed 3234.93 samples/sec Loss 2.1775 LearningRate 0.0070 Epoch: 14 Global Step: 182720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:34:15,978-Speed 3280.01 samples/sec Loss 2.2026 LearningRate 0.0070 Epoch: 14 Global Step: 182730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:34:19,098-Speed 3282.43 samples/sec Loss 2.2400 LearningRate 0.0070 Epoch: 14 Global Step: 182740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:34:22,154-Speed 3352.07 samples/sec Loss 2.2356 LearningRate 0.0070 Epoch: 14 Global Step: 182750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:25,224-Speed 3336.75 samples/sec Loss 2.1826 LearningRate 0.0070 Epoch: 14 Global Step: 182760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:28,275-Speed 3357.89 samples/sec Loss 2.2005 LearningRate 0.0070 Epoch: 14 Global Step: 182770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:31,340-Speed 3341.45 samples/sec Loss 2.1698 LearningRate 0.0070 Epoch: 14 Global Step: 182780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:34,466-Speed 3276.60 samples/sec Loss 2.2755 LearningRate 0.0070 Epoch: 14 Global Step: 182790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:37,657-Speed 3209.92 samples/sec Loss 2.1776 LearningRate 0.0070 Epoch: 14 Global Step: 182800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:40,797-Speed 3262.93 samples/sec Loss 2.1616 LearningRate 0.0070 Epoch: 14 Global Step: 182810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:43,953-Speed 3245.17 samples/sec Loss 2.1926 LearningRate 0.0070 Epoch: 14 Global Step: 182820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:47,037-Speed 3321.17 samples/sec Loss 2.2168 LearningRate 0.0070 Epoch: 14 Global Step: 182830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:50,186-Speed 3252.98 samples/sec Loss 2.1849 LearningRate 0.0070 Epoch: 14 Global Step: 182840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:34:53,357-Speed 3230.39 samples/sec Loss 2.2269 LearningRate 0.0070 Epoch: 14 Global Step: 182850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:34:56,480-Speed 3279.75 samples/sec Loss 2.1642 LearningRate 0.0070 Epoch: 14 Global Step: 182860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:34:59,645-Speed 3236.71 samples/sec Loss 2.2505 LearningRate 0.0070 Epoch: 14 Global Step: 182870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:35:02,817-Speed 3229.11 samples/sec Loss 2.1320 LearningRate 0.0070 Epoch: 14 Global Step: 182880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:35:05,958-Speed 3261.25 samples/sec Loss 2.1209 LearningRate 0.0070 Epoch: 14 Global Step: 182890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:35:09,035-Speed 3329.35 samples/sec Loss 2.1628 LearningRate 0.0070 Epoch: 14 Global Step: 182900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:35:12,250-Speed 3185.50 samples/sec Loss 2.2062 LearningRate 0.0070 Epoch: 14 Global Step: 182910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:35:15,402-Speed 3250.36 samples/sec Loss 2.2184 LearningRate 0.0070 Epoch: 14 Global Step: 182920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:35:18,488-Speed 3318.80 samples/sec Loss 2.2152 LearningRate 0.0069 Epoch: 14 Global Step: 182930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:21,557-Speed 3338.25 samples/sec Loss 2.1926 LearningRate 0.0069 Epoch: 14 Global Step: 182940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:24,630-Speed 3332.85 samples/sec Loss 2.1082 LearningRate 0.0069 Epoch: 14 Global Step: 182950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:27,727-Speed 3306.71 samples/sec Loss 2.2087 LearningRate 0.0069 Epoch: 14 Global Step: 182960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:30,938-Speed 3191.00 samples/sec Loss 2.2095 LearningRate 0.0069 Epoch: 14 Global Step: 182970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:34,041-Speed 3300.27 samples/sec Loss 2.1617 LearningRate 0.0069 Epoch: 14 Global Step: 182980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:37,249-Speed 3193.56 samples/sec Loss 2.2042 LearningRate 0.0069 Epoch: 14 Global Step: 182990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:40,346-Speed 3307.58 samples/sec Loss 2.2372 LearningRate 0.0069 Epoch: 14 Global Step: 183000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:43,438-Speed 3312.38 samples/sec Loss 2.1454 LearningRate 0.0069 Epoch: 14 Global Step: 183010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:46,531-Speed 3312.06 samples/sec Loss 2.2539 LearningRate 0.0069 Epoch: 14 Global Step: 183020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:49,629-Speed 3306.74 samples/sec Loss 2.1682 LearningRate 0.0069 Epoch: 14 Global Step: 183030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:35:52,759-Speed 3271.86 samples/sec Loss 2.1720 LearningRate 0.0069 Epoch: 14 Global Step: 183040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:55,856-Speed 3308.04 samples/sec Loss 2.1532 LearningRate 0.0069 Epoch: 14 Global Step: 183050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:35:58,957-Speed 3302.99 samples/sec Loss 2.1873 LearningRate 0.0069 Epoch: 14 Global Step: 183060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:02,113-Speed 3245.95 samples/sec Loss 2.1828 LearningRate 0.0069 Epoch: 14 Global Step: 183070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:05,290-Speed 3223.05 samples/sec Loss 2.2936 LearningRate 0.0069 Epoch: 14 Global Step: 183080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:08,432-Speed 3260.40 samples/sec Loss 2.1921 LearningRate 0.0069 Epoch: 14 Global Step: 183090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:11,552-Speed 3283.14 samples/sec Loss 2.2515 LearningRate 0.0069 Epoch: 14 Global Step: 183100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:14,683-Speed 3271.35 samples/sec Loss 2.1204 LearningRate 0.0069 Epoch: 14 Global Step: 183110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:17,837-Speed 3248.17 samples/sec Loss 2.1613 LearningRate 0.0069 Epoch: 14 Global Step: 183120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:20,925-Speed 3317.93 samples/sec Loss 2.1691 LearningRate 0.0069 Epoch: 14 Global Step: 183130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:24,095-Speed 3230.82 samples/sec Loss 2.2427 LearningRate 0.0069 Epoch: 14 Global Step: 183140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:36:27,291-Speed 3204.74 samples/sec Loss 2.1670 LearningRate 0.0069 Epoch: 14 Global Step: 183150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:36:30,512-Speed 3180.82 samples/sec Loss 2.1789 LearningRate 0.0069 Epoch: 14 Global Step: 183160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:36:33,630-Speed 3285.54 samples/sec Loss 2.1833 LearningRate 0.0069 Epoch: 14 Global Step: 183170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:36:36,733-Speed 3300.42 samples/sec Loss 2.1915 LearningRate 0.0069 Epoch: 14 Global Step: 183180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:39,941-Speed 3192.90 samples/sec Loss 2.1776 LearningRate 0.0069 Epoch: 14 Global Step: 183190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:43,070-Speed 3273.57 samples/sec Loss 2.1863 LearningRate 0.0069 Epoch: 14 Global Step: 183200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:46,147-Speed 3329.41 samples/sec Loss 2.2118 LearningRate 0.0069 Epoch: 14 Global Step: 183210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:49,278-Speed 3271.53 samples/sec Loss 2.1697 LearningRate 0.0069 Epoch: 14 Global Step: 183220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:52,372-Speed 3310.65 samples/sec Loss 2.1651 LearningRate 0.0069 Epoch: 14 Global Step: 183230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:55,506-Speed 3268.61 samples/sec Loss 2.2408 LearningRate 0.0069 Epoch: 14 Global Step: 183240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:36:58,624-Speed 3285.72 samples/sec Loss 2.1765 LearningRate 0.0069 Epoch: 14 Global Step: 183250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:01,693-Speed 3337.19 samples/sec Loss 2.2542 LearningRate 0.0069 Epoch: 14 Global Step: 183260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:04,875-Speed 3219.60 samples/sec Loss 2.1900 LearningRate 0.0069 Epoch: 14 Global Step: 183270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:08,014-Speed 3262.44 samples/sec Loss 2.2548 LearningRate 0.0069 Epoch: 14 Global Step: 183280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:11,081-Speed 3340.09 samples/sec Loss 2.2512 LearningRate 0.0069 Epoch: 14 Global Step: 183290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:14,178-Speed 3307.87 samples/sec Loss 2.1569 LearningRate 0.0069 Epoch: 14 Global Step: 183300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:17,300-Speed 3281.48 samples/sec Loss 2.2019 LearningRate 0.0069 Epoch: 14 Global Step: 183310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:20,367-Speed 3338.67 samples/sec Loss 2.1376 LearningRate 0.0069 Epoch: 14 Global Step: 183320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:23,513-Speed 3256.36 samples/sec Loss 2.2308 LearningRate 0.0069 Epoch: 14 Global Step: 183330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:26,604-Speed 3314.45 samples/sec Loss 2.1990 LearningRate 0.0069 Epoch: 14 Global Step: 183340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:29,792-Speed 3212.82 samples/sec Loss 2.1848 LearningRate 0.0069 Epoch: 14 Global Step: 183350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:32,880-Speed 3316.52 samples/sec Loss 2.1972 LearningRate 0.0069 Epoch: 14 Global Step: 183360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:36,061-Speed 3221.00 samples/sec Loss 2.2579 LearningRate 0.0069 Epoch: 14 Global Step: 183370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:37:39,226-Speed 3235.40 samples/sec Loss 2.1933 LearningRate 0.0069 Epoch: 14 Global Step: 183380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:37:42,350-Speed 3279.70 samples/sec Loss 2.1959 LearningRate 0.0069 Epoch: 14 Global Step: 183390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:37:45,427-Speed 3329.84 samples/sec Loss 2.1739 LearningRate 0.0069 Epoch: 14 Global Step: 183400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:37:48,502-Speed 3330.49 samples/sec Loss 2.1714 LearningRate 0.0068 Epoch: 14 Global Step: 183410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:37:51,612-Speed 3294.27 samples/sec Loss 2.1704 LearningRate 0.0068 Epoch: 14 Global Step: 183420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:37:54,716-Speed 3300.11 samples/sec Loss 2.2204 LearningRate 0.0068 Epoch: 14 Global Step: 183430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:37:57,802-Speed 3319.10 samples/sec Loss 2.1836 LearningRate 0.0068 Epoch: 14 Global Step: 183440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:38:00,917-Speed 3288.21 samples/sec Loss 2.1257 LearningRate 0.0068 Epoch: 14 Global Step: 183450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:38:04,080-Speed 3238.53 samples/sec Loss 2.1810 LearningRate 0.0068 Epoch: 14 Global Step: 183460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:38:07,240-Speed 3241.33 samples/sec Loss 2.1676 LearningRate 0.0068 Epoch: 14 Global Step: 183470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:38:10,296-Speed 3352.42 samples/sec Loss 2.1659 LearningRate 0.0068 Epoch: 14 Global Step: 183480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 17:38:13,410-Speed 3289.45 samples/sec Loss 2.2079 LearningRate 0.0068 Epoch: 14 Global Step: 183490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:38:16,544-Speed 3268.63 samples/sec Loss 2.2187 LearningRate 0.0068 Epoch: 14 Global Step: 183500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:38:19,636-Speed 3313.33 samples/sec Loss 2.2072 LearningRate 0.0068 Epoch: 14 Global Step: 183510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:38:22,714-Speed 3327.97 samples/sec Loss 2.2014 LearningRate 0.0068 Epoch: 14 Global Step: 183520 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:38:25,831-Speed 3285.82 samples/sec Loss 2.1764 LearningRate 0.0068 Epoch: 14 Global Step: 183530 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:38:28,976-Speed 3257.56 samples/sec Loss 2.1743 LearningRate 0.0068 Epoch: 14 Global Step: 183540 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:38:32,100-Speed 3278.58 samples/sec Loss 2.2376 LearningRate 0.0068 Epoch: 14 Global Step: 183550 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:38:35,191-Speed 3312.94 samples/sec Loss 2.1811 LearningRate 0.0068 Epoch: 14 Global Step: 183560 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:38:38,352-Speed 3240.80 samples/sec Loss 2.1890 LearningRate 0.0068 Epoch: 14 Global Step: 183570 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:38:41,442-Speed 3315.20 samples/sec Loss 2.2275 LearningRate 0.0068 Epoch: 14 Global Step: 183580 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:38:44,491-Speed 3360.04 samples/sec Loss 2.2218 LearningRate 0.0068 Epoch: 14 Global Step: 183590 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:38:47,559-Speed 3338.74 samples/sec Loss 2.2068 LearningRate 0.0068 Epoch: 14 Global Step: 183600 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:38:50,655-Speed 3308.38 samples/sec Loss 2.2431 LearningRate 0.0068 Epoch: 14 Global Step: 183610 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:38:53,756-Speed 3303.17 samples/sec Loss 2.2152 LearningRate 0.0068 Epoch: 14 Global Step: 183620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:38:56,806-Speed 3358.75 samples/sec Loss 2.1505 LearningRate 0.0068 Epoch: 14 Global Step: 183630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:38:59,865-Speed 3348.24 samples/sec Loss 2.1801 LearningRate 0.0068 Epoch: 14 Global Step: 183640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:39:03,012-Speed 3255.29 samples/sec Loss 2.2338 LearningRate 0.0068 Epoch: 14 Global Step: 183650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:39:06,095-Speed 3321.43 samples/sec Loss 2.2411 LearningRate 0.0068 Epoch: 14 Global Step: 183660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:39:09,235-Speed 3262.77 samples/sec Loss 2.1903 LearningRate 0.0068 Epoch: 14 Global Step: 183670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:39:12,383-Speed 3254.17 samples/sec Loss 2.1662 LearningRate 0.0068 Epoch: 14 Global Step: 183680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:39:15,495-Speed 3291.81 samples/sec Loss 2.1815 LearningRate 0.0068 Epoch: 14 Global Step: 183690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:39:18,650-Speed 3246.10 samples/sec Loss 2.2147 LearningRate 0.0068 Epoch: 14 Global Step: 183700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:39:21,723-Speed 3333.51 samples/sec Loss 2.1738 LearningRate 0.0068 Epoch: 14 Global Step: 183710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:39:24,876-Speed 3248.64 samples/sec Loss 2.2099 LearningRate 0.0068 Epoch: 14 Global Step: 183720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:39:27,997-Speed 3282.59 samples/sec Loss 2.3130 LearningRate 0.0068 Epoch: 14 Global Step: 183730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:39:31,147-Speed 3251.22 samples/sec Loss 2.2278 LearningRate 0.0068 Epoch: 14 Global Step: 183740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:39:34,205-Speed 3349.73 samples/sec Loss 2.2006 LearningRate 0.0068 Epoch: 14 Global Step: 183750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:39:37,300-Speed 3309.25 samples/sec Loss 2.1399 LearningRate 0.0068 Epoch: 14 Global Step: 183760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:39:40,369-Speed 3338.15 samples/sec Loss 2.1570 LearningRate 0.0068 Epoch: 14 Global Step: 183770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:39:43,467-Speed 3306.40 samples/sec Loss 2.2756 LearningRate 0.0068 Epoch: 14 Global Step: 183780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:39:46,552-Speed 3320.29 samples/sec Loss 2.1968 LearningRate 0.0068 Epoch: 14 Global Step: 183790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:39:49,686-Speed 3268.06 samples/sec Loss 2.1431 LearningRate 0.0068 Epoch: 14 Global Step: 183800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:39:52,804-Speed 3285.78 samples/sec Loss 2.1473 LearningRate 0.0068 Epoch: 14 Global Step: 183810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:39:55,850-Speed 3363.03 samples/sec Loss 2.2037 LearningRate 0.0068 Epoch: 14 Global Step: 183820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:39:59,008-Speed 3243.34 samples/sec Loss 2.2102 LearningRate 0.0068 Epoch: 14 Global Step: 183830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:40:02,170-Speed 3239.97 samples/sec Loss 2.0925 LearningRate 0.0068 Epoch: 14 Global Step: 183840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:40:05,309-Speed 3262.67 samples/sec Loss 2.1755 LearningRate 0.0068 Epoch: 14 Global Step: 183850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:40:08,426-Speed 3287.28 samples/sec Loss 2.2661 LearningRate 0.0068 Epoch: 14 Global Step: 183860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:40:11,505-Speed 3326.90 samples/sec Loss 2.2265 LearningRate 0.0068 Epoch: 14 Global Step: 183870 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:40:14,610-Speed 3298.12 samples/sec Loss 2.1706 LearningRate 0.0067 Epoch: 14 Global Step: 183880 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:40:17,701-Speed 3314.09 samples/sec Loss 2.1714 LearningRate 0.0067 Epoch: 14 Global Step: 183890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:40:20,789-Speed 3317.08 samples/sec Loss 2.1255 LearningRate 0.0067 Epoch: 14 Global Step: 183900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:40:23,940-Speed 3250.39 samples/sec Loss 2.2136 LearningRate 0.0067 Epoch: 14 Global Step: 183910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:40:27,032-Speed 3313.35 samples/sec Loss 2.2368 LearningRate 0.0067 Epoch: 14 Global Step: 183920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:40:30,259-Speed 3174.38 samples/sec Loss 2.2147 LearningRate 0.0067 Epoch: 14 Global Step: 183930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:40:33,416-Speed 3244.22 samples/sec Loss 2.1978 LearningRate 0.0067 Epoch: 14 Global Step: 183940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:40:36,485-Speed 3338.23 samples/sec Loss 2.1329 LearningRate 0.0067 Epoch: 14 Global Step: 183950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:40:39,686-Speed 3199.91 samples/sec Loss 2.1336 LearningRate 0.0067 Epoch: 14 Global Step: 183960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:40:42,800-Speed 3289.46 samples/sec Loss 2.1458 LearningRate 0.0067 Epoch: 14 Global Step: 183970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:40:45,853-Speed 3355.25 samples/sec Loss 2.1641 LearningRate 0.0067 Epoch: 14 Global Step: 183980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:40:48,997-Speed 3258.07 samples/sec Loss 2.2215 LearningRate 0.0067 Epoch: 14 Global Step: 183990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:40:52,127-Speed 3272.12 samples/sec Loss 2.1989 LearningRate 0.0067 Epoch: 14 Global Step: 184000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:40:55,221-Speed 3310.38 samples/sec Loss 2.1814 LearningRate 0.0067 Epoch: 14 Global Step: 184010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:40:58,328-Speed 3297.90 samples/sec Loss 2.2027 LearningRate 0.0067 Epoch: 14 Global Step: 184020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:41:01,427-Speed 3305.33 samples/sec Loss 2.1697 LearningRate 0.0067 Epoch: 14 Global Step: 184030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:41:04,588-Speed 3240.09 samples/sec Loss 2.2410 LearningRate 0.0067 Epoch: 14 Global Step: 184040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:41:07,780-Speed 3209.05 samples/sec Loss 2.2163 LearningRate 0.0067 Epoch: 14 Global Step: 184050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:41:10,851-Speed 3335.27 samples/sec Loss 2.2101 LearningRate 0.0067 Epoch: 14 Global Step: 184060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:41:13,955-Speed 3300.56 samples/sec Loss 2.1089 LearningRate 0.0067 Epoch: 14 Global Step: 184070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:41:17,120-Speed 3236.48 samples/sec Loss 2.1278 LearningRate 0.0067 Epoch: 14 Global Step: 184080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:41:20,253-Speed 3269.68 samples/sec Loss 2.1230 LearningRate 0.0067 Epoch: 14 Global Step: 184090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:41:23,430-Speed 3223.98 samples/sec Loss 2.2110 LearningRate 0.0067 Epoch: 14 Global Step: 184100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:41:26,566-Speed 3267.06 samples/sec Loss 2.2083 LearningRate 0.0067 Epoch: 14 Global Step: 184110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:41:29,654-Speed 3315.96 samples/sec Loss 2.1905 LearningRate 0.0067 Epoch: 14 Global Step: 184120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:41:32,728-Speed 3333.35 samples/sec Loss 2.2101 LearningRate 0.0067 Epoch: 14 Global Step: 184130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:41:35,788-Speed 3347.24 samples/sec Loss 2.2009 LearningRate 0.0067 Epoch: 14 Global Step: 184140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:41:38,851-Speed 3344.01 samples/sec Loss 2.2401 LearningRate 0.0067 Epoch: 14 Global Step: 184150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:41:41,967-Speed 3287.35 samples/sec Loss 2.1406 LearningRate 0.0067 Epoch: 14 Global Step: 184160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:41:45,056-Speed 3315.91 samples/sec Loss 2.2344 LearningRate 0.0067 Epoch: 14 Global Step: 184170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:41:48,174-Speed 3284.64 samples/sec Loss 2.2474 LearningRate 0.0067 Epoch: 14 Global Step: 184180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:41:51,307-Speed 3269.96 samples/sec Loss 2.1945 LearningRate 0.0067 Epoch: 14 Global Step: 184190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:41:54,518-Speed 3190.22 samples/sec Loss 2.1929 LearningRate 0.0067 Epoch: 14 Global Step: 184200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:41:57,618-Speed 3303.94 samples/sec Loss 2.2879 LearningRate 0.0067 Epoch: 14 Global Step: 184210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:42:00,686-Speed 3339.34 samples/sec Loss 2.2190 LearningRate 0.0067 Epoch: 14 Global Step: 184220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:42:03,799-Speed 3290.69 samples/sec Loss 2.2081 LearningRate 0.0067 Epoch: 14 Global Step: 184230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:42:06,961-Speed 3238.93 samples/sec Loss 2.1751 LearningRate 0.0067 Epoch: 14 Global Step: 184240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:42:10,062-Speed 3302.97 samples/sec Loss 2.2182 LearningRate 0.0067 Epoch: 14 Global Step: 184250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:42:13,190-Speed 3274.83 samples/sec Loss 2.2356 LearningRate 0.0067 Epoch: 14 Global Step: 184260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:42:16,327-Speed 3265.24 samples/sec Loss 2.1613 LearningRate 0.0067 Epoch: 14 Global Step: 184270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:42:19,399-Speed 3334.93 samples/sec Loss 2.2046 LearningRate 0.0067 Epoch: 14 Global Step: 184280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:42:22,514-Speed 3288.26 samples/sec Loss 2.1782 LearningRate 0.0067 Epoch: 14 Global Step: 184290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:42:25,573-Speed 3349.21 samples/sec Loss 2.1443 LearningRate 0.0067 Epoch: 14 Global Step: 184300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:42:28,707-Speed 3267.62 samples/sec Loss 2.2006 LearningRate 0.0067 Epoch: 14 Global Step: 184310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:42:31,777-Speed 3336.41 samples/sec Loss 2.2106 LearningRate 0.0067 Epoch: 14 Global Step: 184320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:42:34,850-Speed 3334.29 samples/sec Loss 2.1232 LearningRate 0.0067 Epoch: 14 Global Step: 184330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:42:37,966-Speed 3286.76 samples/sec Loss 2.1865 LearningRate 0.0067 Epoch: 14 Global Step: 184340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:42:41,045-Speed 3327.08 samples/sec Loss 2.1212 LearningRate 0.0067 Epoch: 14 Global Step: 184350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:42:44,162-Speed 3286.18 samples/sec Loss 2.1826 LearningRate 0.0066 Epoch: 14 Global Step: 184360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:42:47,262-Speed 3304.43 samples/sec Loss 2.2371 LearningRate 0.0066 Epoch: 14 Global Step: 184370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:42:50,335-Speed 3333.47 samples/sec Loss 2.1041 LearningRate 0.0066 Epoch: 14 Global Step: 184380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 17:42:53,431-Speed 3308.86 samples/sec Loss 2.1855 LearningRate 0.0066 Epoch: 14 Global Step: 184390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:42:56,538-Speed 3296.18 samples/sec Loss 2.1572 LearningRate 0.0066 Epoch: 14 Global Step: 184400 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:42:59,638-Speed 3304.01 samples/sec Loss 2.2423 LearningRate 0.0066 Epoch: 14 Global Step: 184410 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:43:02,798-Speed 3242.27 samples/sec Loss 2.2575 LearningRate 0.0066 Epoch: 14 Global Step: 184420 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:43:05,864-Speed 3340.54 samples/sec Loss 2.1988 LearningRate 0.0066 Epoch: 14 Global Step: 184430 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:43:08,955-Speed 3314.32 samples/sec Loss 2.2569 LearningRate 0.0066 Epoch: 14 Global Step: 184440 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:43:12,021-Speed 3340.48 samples/sec Loss 2.2067 LearningRate 0.0066 Epoch: 14 Global Step: 184450 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:43:15,132-Speed 3292.84 samples/sec Loss 2.1794 LearningRate 0.0066 Epoch: 14 Global Step: 184460 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:43:18,261-Speed 3274.09 samples/sec Loss 2.1796 LearningRate 0.0066 Epoch: 14 Global Step: 184470 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:43:21,321-Speed 3347.59 samples/sec Loss 2.1280 LearningRate 0.0066 Epoch: 14 Global Step: 184480 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:43:24,493-Speed 3229.45 samples/sec Loss 2.0993 LearningRate 0.0066 Epoch: 14 Global Step: 184490 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:43:27,633-Speed 3261.31 samples/sec Loss 2.1918 LearningRate 0.0066 Epoch: 14 Global Step: 184500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:43:30,717-Speed 3321.88 samples/sec Loss 2.2432 LearningRate 0.0066 Epoch: 14 Global Step: 184510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:43:33,808-Speed 3314.15 samples/sec Loss 2.1403 LearningRate 0.0066 Epoch: 14 Global Step: 184520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:43:36,902-Speed 3309.94 samples/sec Loss 2.2023 LearningRate 0.0066 Epoch: 14 Global Step: 184530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:43:40,043-Speed 3261.57 samples/sec Loss 2.1916 LearningRate 0.0066 Epoch: 14 Global Step: 184540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:43:43,178-Speed 3267.23 samples/sec Loss 2.2499 LearningRate 0.0066 Epoch: 14 Global Step: 184550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:43:46,253-Speed 3331.66 samples/sec Loss 2.1578 LearningRate 0.0066 Epoch: 14 Global Step: 184560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:43:49,316-Speed 3343.37 samples/sec Loss 2.1588 LearningRate 0.0066 Epoch: 14 Global Step: 184570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:43:52,379-Speed 3343.96 samples/sec Loss 2.1858 LearningRate 0.0066 Epoch: 14 Global Step: 184580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:43:55,481-Speed 3302.94 samples/sec Loss 2.2565 LearningRate 0.0066 Epoch: 14 Global Step: 184590 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:43:58,591-Speed 3293.35 samples/sec Loss 2.2261 LearningRate 0.0066 Epoch: 14 Global Step: 184600 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:44:01,793-Speed 3199.67 samples/sec Loss 2.1316 LearningRate 0.0066 Epoch: 14 Global Step: 184610 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:44:04,915-Speed 3280.28 samples/sec Loss 2.1483 LearningRate 0.0066 Epoch: 14 Global Step: 184620 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:44:07,980-Speed 3342.16 samples/sec Loss 2.1546 LearningRate 0.0066 Epoch: 14 Global Step: 184630 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:44:11,061-Speed 3325.36 samples/sec Loss 2.1893 LearningRate 0.0066 Epoch: 14 Global Step: 184640 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:44:14,193-Speed 3269.90 samples/sec Loss 2.2594 LearningRate 0.0066 Epoch: 14 Global Step: 184650 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:44:17,356-Speed 3238.84 samples/sec Loss 2.1572 LearningRate 0.0066 Epoch: 14 Global Step: 184660 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:44:20,405-Speed 3359.88 samples/sec Loss 2.1628 LearningRate 0.0066 Epoch: 14 Global Step: 184670 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:44:23,540-Speed 3266.89 samples/sec Loss 2.1562 LearningRate 0.0066 Epoch: 14 Global Step: 184680 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:44:26,706-Speed 3235.84 samples/sec Loss 2.1633 LearningRate 0.0066 Epoch: 14 Global Step: 184690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:44:29,827-Speed 3282.18 samples/sec Loss 2.2720 LearningRate 0.0066 Epoch: 14 Global Step: 184700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:44:32,903-Speed 3329.63 samples/sec Loss 2.2004 LearningRate 0.0066 Epoch: 14 Global Step: 184710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:44:36,076-Speed 3229.03 samples/sec Loss 2.2355 LearningRate 0.0066 Epoch: 14 Global Step: 184720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:44:39,193-Speed 3285.44 samples/sec Loss 2.2174 LearningRate 0.0066 Epoch: 14 Global Step: 184730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:44:42,304-Speed 3293.12 samples/sec Loss 2.2172 LearningRate 0.0066 Epoch: 14 Global Step: 184740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:44:45,413-Speed 3295.27 samples/sec Loss 2.1920 LearningRate 0.0066 Epoch: 14 Global Step: 184750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 17:44:48,551-Speed 3263.74 samples/sec Loss 2.1646 LearningRate 0.0066 Epoch: 14 Global Step: 184760 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 17:44:51,772-Speed 3180.58 samples/sec Loss 2.1200 LearningRate 0.0066 Epoch: 14 Global Step: 184770 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:44:54,916-Speed 3257.21 samples/sec Loss 2.1336 LearningRate 0.0066 Epoch: 14 Global Step: 184780 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:44:58,025-Speed 3295.34 samples/sec Loss 2.2120 LearningRate 0.0066 Epoch: 14 Global Step: 184790 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:45:01,198-Speed 3228.12 samples/sec Loss 2.2383 LearningRate 0.0066 Epoch: 14 Global Step: 184800 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:45:04,457-Speed 3143.40 samples/sec Loss 2.2196 LearningRate 0.0066 Epoch: 14 Global Step: 184810 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:45:07,584-Speed 3275.44 samples/sec Loss 2.2102 LearningRate 0.0066 Epoch: 14 Global Step: 184820 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:45:10,655-Speed 3335.97 samples/sec Loss 2.2254 LearningRate 0.0066 Epoch: 14 Global Step: 184830 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:45:13,853-Speed 3202.11 samples/sec Loss 2.1558 LearningRate 0.0066 Epoch: 14 Global Step: 184840 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:45:16,976-Speed 3280.31 samples/sec Loss 2.1988 LearningRate 0.0065 Epoch: 14 Global Step: 184850 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:45:20,074-Speed 3306.90 samples/sec Loss 2.2007 LearningRate 0.0065 Epoch: 14 Global Step: 184860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:45:23,198-Speed 3278.29 samples/sec Loss 2.2205 LearningRate 0.0065 Epoch: 14 Global Step: 184870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:45:26,298-Speed 3304.86 samples/sec Loss 2.1839 LearningRate 0.0065 Epoch: 14 Global Step: 184880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:45:29,442-Speed 3257.52 samples/sec Loss 2.2253 LearningRate 0.0065 Epoch: 14 Global Step: 184890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:45:32,567-Speed 3277.97 samples/sec Loss 2.0979 LearningRate 0.0065 Epoch: 14 Global Step: 184900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:45:35,716-Speed 3252.55 samples/sec Loss 2.2167 LearningRate 0.0065 Epoch: 14 Global Step: 184910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:45:38,877-Speed 3240.77 samples/sec Loss 2.2330 LearningRate 0.0065 Epoch: 14 Global Step: 184920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:45:42,001-Speed 3278.94 samples/sec Loss 2.1889 LearningRate 0.0065 Epoch: 14 Global Step: 184930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:45:45,085-Speed 3322.51 samples/sec Loss 2.1203 LearningRate 0.0065 Epoch: 14 Global Step: 184940 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:45:48,262-Speed 3223.88 samples/sec Loss 2.2709 LearningRate 0.0065 Epoch: 14 Global Step: 184950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:45:51,349-Speed 3318.57 samples/sec Loss 2.1781 LearningRate 0.0065 Epoch: 14 Global Step: 184960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:45:54,583-Speed 3167.46 samples/sec Loss 2.1708 LearningRate 0.0065 Epoch: 14 Global Step: 184970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:45:57,672-Speed 3316.08 samples/sec Loss 2.2215 LearningRate 0.0065 Epoch: 14 Global Step: 184980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:00,762-Speed 3314.29 samples/sec Loss 2.1737 LearningRate 0.0065 Epoch: 14 Global Step: 184990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:03,929-Speed 3234.70 samples/sec Loss 2.1801 LearningRate 0.0065 Epoch: 14 Global Step: 185000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:07,079-Speed 3251.71 samples/sec Loss 2.2161 LearningRate 0.0065 Epoch: 14 Global Step: 185010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:10,188-Speed 3294.36 samples/sec Loss 2.1136 LearningRate 0.0065 Epoch: 14 Global Step: 185020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:13,288-Speed 3304.10 samples/sec Loss 2.2640 LearningRate 0.0065 Epoch: 14 Global Step: 185030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:16,387-Speed 3305.58 samples/sec Loss 2.1882 LearningRate 0.0065 Epoch: 14 Global Step: 185040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:19,534-Speed 3255.35 samples/sec Loss 2.2305 LearningRate 0.0065 Epoch: 14 Global Step: 185050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:22,635-Speed 3303.20 samples/sec Loss 2.2727 LearningRate 0.0065 Epoch: 14 Global Step: 185060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 17:46:25,750-Speed 3288.32 samples/sec Loss 2.1891 LearningRate 0.0065 Epoch: 14 Global Step: 185070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:28,875-Speed 3277.83 samples/sec Loss 2.2166 LearningRate 0.0065 Epoch: 14 Global Step: 185080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:31,980-Speed 3299.06 samples/sec Loss 2.2112 LearningRate 0.0065 Epoch: 14 Global Step: 185090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:35,125-Speed 3256.58 samples/sec Loss 2.1633 LearningRate 0.0065 Epoch: 14 Global Step: 185100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:38,221-Speed 3309.03 samples/sec Loss 2.1539 LearningRate 0.0065 Epoch: 14 Global Step: 185110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:46:41,286-Speed 3341.87 samples/sec Loss 2.1245 LearningRate 0.0065 Epoch: 14 Global Step: 185120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:46:44,357-Speed 3335.97 samples/sec Loss 2.2252 LearningRate 0.0065 Epoch: 14 Global Step: 185130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:46:47,470-Speed 3290.44 samples/sec Loss 2.1422 LearningRate 0.0065 Epoch: 14 Global Step: 185140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:46:50,582-Speed 3291.21 samples/sec Loss 2.2553 LearningRate 0.0065 Epoch: 14 Global Step: 185150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:46:53,696-Speed 3289.14 samples/sec Loss 2.1636 LearningRate 0.0065 Epoch: 14 Global Step: 185160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:46:56,801-Speed 3299.23 samples/sec Loss 2.2177 LearningRate 0.0065 Epoch: 14 Global Step: 185170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:46:59,886-Speed 3319.97 samples/sec Loss 2.2178 LearningRate 0.0065 Epoch: 14 Global Step: 185180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:47:02,985-Speed 3305.14 samples/sec Loss 2.1869 LearningRate 0.0065 Epoch: 14 Global Step: 185190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:47:06,138-Speed 3249.04 samples/sec Loss 2.1656 LearningRate 0.0065 Epoch: 14 Global Step: 185200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:47:09,272-Speed 3269.01 samples/sec Loss 2.2386 LearningRate 0.0065 Epoch: 14 Global Step: 185210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:47:12,373-Speed 3302.80 samples/sec Loss 2.2486 LearningRate 0.0065 Epoch: 14 Global Step: 185220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:47:15,521-Speed 3254.06 samples/sec Loss 2.1888 LearningRate 0.0065 Epoch: 14 Global Step: 185230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:47:18,633-Speed 3291.40 samples/sec Loss 2.2256 LearningRate 0.0065 Epoch: 14 Global Step: 185240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:47:21,744-Speed 3292.06 samples/sec Loss 2.1532 LearningRate 0.0065 Epoch: 14 Global Step: 185250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:47:24,919-Speed 3226.91 samples/sec Loss 2.2358 LearningRate 0.0065 Epoch: 14 Global Step: 185260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:47:28,102-Speed 3217.76 samples/sec Loss 2.1786 LearningRate 0.0065 Epoch: 14 Global Step: 185270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:47:31,211-Speed 3294.86 samples/sec Loss 2.1626 LearningRate 0.0065 Epoch: 14 Global Step: 185280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:47:34,318-Speed 3297.21 samples/sec Loss 2.1130 LearningRate 0.0065 Epoch: 14 Global Step: 185290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:47:37,437-Speed 3284.55 samples/sec Loss 2.1628 LearningRate 0.0065 Epoch: 14 Global Step: 185300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:47:40,595-Speed 3242.92 samples/sec Loss 2.1643 LearningRate 0.0065 Epoch: 14 Global Step: 185310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:47:43,739-Speed 3258.37 samples/sec Loss 2.2185 LearningRate 0.0065 Epoch: 14 Global Step: 185320 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:47:46,881-Speed 3259.46 samples/sec Loss 2.2075 LearningRate 0.0064 Epoch: 14 Global Step: 185330 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:47:50,074-Speed 3207.98 samples/sec Loss 2.2482 LearningRate 0.0064 Epoch: 14 Global Step: 185340 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:47:53,200-Speed 3276.62 samples/sec Loss 2.1796 LearningRate 0.0064 Epoch: 14 Global Step: 185350 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:47:56,272-Speed 3334.25 samples/sec Loss 2.1561 LearningRate 0.0064 Epoch: 14 Global Step: 185360 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:47:59,358-Speed 3319.54 samples/sec Loss 2.1187 LearningRate 0.0064 Epoch: 14 Global Step: 185370 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:48:02,535-Speed 3224.47 samples/sec Loss 2.1887 LearningRate 0.0064 Epoch: 14 Global Step: 185380 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:48:05,639-Speed 3299.25 samples/sec Loss 2.1308 LearningRate 0.0064 Epoch: 14 Global Step: 185390 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:48:08,720-Speed 3325.70 samples/sec Loss 2.1920 LearningRate 0.0064 Epoch: 14 Global Step: 185400 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:48:11,872-Speed 3249.78 samples/sec Loss 2.2095 LearningRate 0.0064 Epoch: 14 Global Step: 185410 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:48:15,016-Speed 3257.76 samples/sec Loss 2.1611 LearningRate 0.0064 Epoch: 14 Global Step: 185420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:18,114-Speed 3306.44 samples/sec Loss 2.1912 LearningRate 0.0064 Epoch: 14 Global Step: 185430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:21,223-Speed 3294.40 samples/sec Loss 2.2271 LearningRate 0.0064 Epoch: 14 Global Step: 185440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:24,315-Speed 3313.09 samples/sec Loss 2.1742 LearningRate 0.0064 Epoch: 14 Global Step: 185450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:27,517-Speed 3198.77 samples/sec Loss 2.2334 LearningRate 0.0064 Epoch: 14 Global Step: 185460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:30,618-Speed 3302.88 samples/sec Loss 2.1903 LearningRate 0.0064 Epoch: 14 Global Step: 185470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:33,720-Speed 3302.52 samples/sec Loss 2.1711 LearningRate 0.0064 Epoch: 14 Global Step: 185480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:36,866-Speed 3255.32 samples/sec Loss 2.2429 LearningRate 0.0064 Epoch: 14 Global Step: 185490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:40,028-Speed 3239.94 samples/sec Loss 2.2274 LearningRate 0.0064 Epoch: 14 Global Step: 185500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:43,147-Speed 3284.14 samples/sec Loss 2.1729 LearningRate 0.0064 Epoch: 14 Global Step: 185510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:46,225-Speed 3328.17 samples/sec Loss 2.1592 LearningRate 0.0064 Epoch: 14 Global Step: 185520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:48:49,332-Speed 3297.34 samples/sec Loss 2.2006 LearningRate 0.0064 Epoch: 14 Global Step: 185530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:52,508-Speed 3224.57 samples/sec Loss 2.1994 LearningRate 0.0064 Epoch: 14 Global Step: 185540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:55,633-Speed 3278.69 samples/sec Loss 2.1687 LearningRate 0.0064 Epoch: 14 Global Step: 185550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:48:58,746-Speed 3290.57 samples/sec Loss 2.1959 LearningRate 0.0064 Epoch: 14 Global Step: 185560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:49:01,952-Speed 3194.64 samples/sec Loss 2.2033 LearningRate 0.0064 Epoch: 14 Global Step: 185570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:49:05,025-Speed 3332.78 samples/sec Loss 2.1817 LearningRate 0.0064 Epoch: 14 Global Step: 185580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:49:08,108-Speed 3323.22 samples/sec Loss 2.1337 LearningRate 0.0064 Epoch: 14 Global Step: 185590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:49:11,215-Speed 3295.94 samples/sec Loss 2.1870 LearningRate 0.0064 Epoch: 14 Global Step: 185600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:49:14,356-Speed 3261.30 samples/sec Loss 2.1995 LearningRate 0.0064 Epoch: 14 Global Step: 185610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:49:17,518-Speed 3240.13 samples/sec Loss 2.1976 LearningRate 0.0064 Epoch: 14 Global Step: 185620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:49:20,629-Speed 3292.46 samples/sec Loss 2.2436 LearningRate 0.0064 Epoch: 14 Global Step: 185630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:49:23,724-Speed 3309.36 samples/sec Loss 2.1085 LearningRate 0.0064 Epoch: 14 Global Step: 185640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:49:26,825-Speed 3303.70 samples/sec Loss 2.2468 LearningRate 0.0064 Epoch: 14 Global Step: 185650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:49:29,930-Speed 3299.01 samples/sec Loss 2.1970 LearningRate 0.0064 Epoch: 14 Global Step: 185660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:49:33,024-Speed 3309.95 samples/sec Loss 2.1489 LearningRate 0.0064 Epoch: 14 Global Step: 185670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:49:36,153-Speed 3273.94 samples/sec Loss 2.2308 LearningRate 0.0064 Epoch: 14 Global Step: 185680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:49:39,290-Speed 3265.21 samples/sec Loss 2.1804 LearningRate 0.0064 Epoch: 14 Global Step: 185690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:49:42,429-Speed 3263.29 samples/sec Loss 2.1063 LearningRate 0.0064 Epoch: 14 Global Step: 185700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:49:45,535-Speed 3298.40 samples/sec Loss 2.2377 LearningRate 0.0064 Epoch: 14 Global Step: 185710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:49:48,654-Speed 3283.62 samples/sec Loss 2.2515 LearningRate 0.0064 Epoch: 14 Global Step: 185720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:49:51,811-Speed 3245.21 samples/sec Loss 2.1789 LearningRate 0.0064 Epoch: 14 Global Step: 185730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 17:49:54,905-Speed 3310.05 samples/sec Loss 2.1582 LearningRate 0.0064 Epoch: 14 Global Step: 185740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:49:58,011-Speed 3298.00 samples/sec Loss 2.1754 LearningRate 0.0064 Epoch: 14 Global Step: 185750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:50:01,140-Speed 3273.65 samples/sec Loss 2.1951 LearningRate 0.0064 Epoch: 14 Global Step: 185760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:50:04,267-Speed 3275.70 samples/sec Loss 2.1945 LearningRate 0.0064 Epoch: 14 Global Step: 185770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:50:07,414-Speed 3255.35 samples/sec Loss 2.1865 LearningRate 0.0064 Epoch: 14 Global Step: 185780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:50:10,519-Speed 3298.46 samples/sec Loss 2.1312 LearningRate 0.0064 Epoch: 14 Global Step: 185790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:50:13,639-Speed 3283.65 samples/sec Loss 2.1518 LearningRate 0.0064 Epoch: 14 Global Step: 185800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:50:16,744-Speed 3299.14 samples/sec Loss 2.1673 LearningRate 0.0064 Epoch: 14 Global Step: 185810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:50:19,825-Speed 3324.19 samples/sec Loss 2.1979 LearningRate 0.0064 Epoch: 14 Global Step: 185820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:50:22,905-Speed 3326.34 samples/sec Loss 2.2280 LearningRate 0.0063 Epoch: 14 Global Step: 185830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:50:26,000-Speed 3308.90 samples/sec Loss 2.2249 LearningRate 0.0063 Epoch: 14 Global Step: 185840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:50:29,070-Speed 3336.40 samples/sec Loss 2.2040 LearningRate 0.0063 Epoch: 14 Global Step: 185850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:50:32,125-Speed 3353.87 samples/sec Loss 2.2078 LearningRate 0.0063 Epoch: 14 Global Step: 185860 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:50:35,292-Speed 3233.78 samples/sec Loss 2.2484 LearningRate 0.0063 Epoch: 14 Global Step: 185870 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:50:38,364-Speed 3334.28 samples/sec Loss 2.1114 LearningRate 0.0063 Epoch: 14 Global Step: 185880 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:50:41,524-Speed 3241.03 samples/sec Loss 2.1920 LearningRate 0.0063 Epoch: 14 Global Step: 185890 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:50:44,608-Speed 3322.54 samples/sec Loss 2.2266 LearningRate 0.0063 Epoch: 14 Global Step: 185900 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:50:47,705-Speed 3307.20 samples/sec Loss 2.1834 LearningRate 0.0063 Epoch: 14 Global Step: 185910 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:50:50,888-Speed 3217.57 samples/sec Loss 2.1585 LearningRate 0.0063 Epoch: 14 Global Step: 185920 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:50:54,015-Speed 3275.76 samples/sec Loss 2.0996 LearningRate 0.0063 Epoch: 14 Global Step: 185930 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:50:57,085-Speed 3336.35 samples/sec Loss 2.1578 LearningRate 0.0063 Epoch: 14 Global Step: 185940 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:51:00,212-Speed 3276.48 samples/sec Loss 2.1821 LearningRate 0.0063 Epoch: 14 Global Step: 185950 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:51:03,398-Speed 3215.09 samples/sec Loss 2.1986 LearningRate 0.0063 Epoch: 14 Global Step: 185960 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:06,551-Speed 3248.44 samples/sec Loss 2.1194 LearningRate 0.0063 Epoch: 14 Global Step: 185970 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:09,652-Speed 3302.84 samples/sec Loss 2.1945 LearningRate 0.0063 Epoch: 14 Global Step: 185980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:12,780-Speed 3274.66 samples/sec Loss 2.1902 LearningRate 0.0063 Epoch: 14 Global Step: 185990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:15,909-Speed 3273.55 samples/sec Loss 2.1773 LearningRate 0.0063 Epoch: 14 Global Step: 186000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:19,006-Speed 3308.26 samples/sec Loss 2.2523 LearningRate 0.0063 Epoch: 14 Global Step: 186010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:22,104-Speed 3305.57 samples/sec Loss 2.2124 LearningRate 0.0063 Epoch: 14 Global Step: 186020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:25,186-Speed 3323.66 samples/sec Loss 2.2204 LearningRate 0.0063 Epoch: 14 Global Step: 186030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:28,293-Speed 3296.82 samples/sec Loss 2.1442 LearningRate 0.0063 Epoch: 14 Global Step: 186040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:31,374-Speed 3324.86 samples/sec Loss 2.1860 LearningRate 0.0063 Epoch: 14 Global Step: 186050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:34,467-Speed 3312.19 samples/sec Loss 2.1941 LearningRate 0.0063 Epoch: 14 Global Step: 186060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:51:37,559-Speed 3311.98 samples/sec Loss 2.1733 LearningRate 0.0063 Epoch: 14 Global Step: 186070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:40,704-Speed 3256.88 samples/sec Loss 2.2061 LearningRate 0.0063 Epoch: 14 Global Step: 186080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:43,794-Speed 3315.72 samples/sec Loss 2.1367 LearningRate 0.0063 Epoch: 14 Global Step: 186090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:46,883-Speed 3315.43 samples/sec Loss 2.2436 LearningRate 0.0063 Epoch: 14 Global Step: 186100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:49,969-Speed 3319.62 samples/sec Loss 2.1613 LearningRate 0.0063 Epoch: 14 Global Step: 186110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:53,080-Speed 3292.91 samples/sec Loss 2.1114 LearningRate 0.0063 Epoch: 14 Global Step: 186120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:56,196-Speed 3287.30 samples/sec Loss 2.2174 LearningRate 0.0063 Epoch: 14 Global Step: 186130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:51:59,268-Speed 3334.59 samples/sec Loss 2.1952 LearningRate 0.0063 Epoch: 14 Global Step: 186140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:52:02,366-Speed 3306.39 samples/sec Loss 2.1803 LearningRate 0.0063 Epoch: 14 Global Step: 186150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:52:05,478-Speed 3291.36 samples/sec Loss 2.1802 LearningRate 0.0063 Epoch: 14 Global Step: 186160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:52:08,581-Speed 3301.40 samples/sec Loss 2.2070 LearningRate 0.0063 Epoch: 14 Global Step: 186170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:52:11,695-Speed 3289.26 samples/sec Loss 2.1357 LearningRate 0.0063 Epoch: 14 Global Step: 186180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:52:14,836-Speed 3260.39 samples/sec Loss 2.1425 LearningRate 0.0063 Epoch: 14 Global Step: 186190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:52:17,975-Speed 3263.21 samples/sec Loss 2.2017 LearningRate 0.0063 Epoch: 14 Global Step: 186200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:52:21,042-Speed 3340.52 samples/sec Loss 2.1978 LearningRate 0.0063 Epoch: 14 Global Step: 186210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:52:24,215-Speed 3227.73 samples/sec Loss 2.1833 LearningRate 0.0063 Epoch: 14 Global Step: 186220 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:52:27,335-Speed 3283.52 samples/sec Loss 2.1921 LearningRate 0.0063 Epoch: 14 Global Step: 186230 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:52:30,525-Speed 3210.31 samples/sec Loss 2.1905 LearningRate 0.0063 Epoch: 14 Global Step: 186240 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:52:33,611-Speed 3320.22 samples/sec Loss 2.1380 LearningRate 0.0063 Epoch: 14 Global Step: 186250 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:52:36,741-Speed 3272.42 samples/sec Loss 2.1577 LearningRate 0.0063 Epoch: 14 Global Step: 186260 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:52:39,871-Speed 3272.01 samples/sec Loss 2.1961 LearningRate 0.0063 Epoch: 14 Global Step: 186270 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:52:42,981-Speed 3293.66 samples/sec Loss 2.0792 LearningRate 0.0063 Epoch: 14 Global Step: 186280 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:52:46,042-Speed 3346.45 samples/sec Loss 2.2293 LearningRate 0.0063 Epoch: 14 Global Step: 186290 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:52:49,108-Speed 3340.80 samples/sec Loss 2.1764 LearningRate 0.0063 Epoch: 14 Global Step: 186300 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:52:52,478-Speed 3039.75 samples/sec Loss 2.1778 LearningRate 0.0063 Epoch: 14 Global Step: 186310 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:53:25,320-Speed 311.81 samples/sec Loss 1.9082 LearningRate 0.0062 Epoch: 15 Global Step: 186320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:53:28,725-Speed 3008.47 samples/sec Loss 1.5385 LearningRate 0.0062 Epoch: 15 Global Step: 186330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:53:31,912-Speed 3213.73 samples/sec Loss 1.4778 LearningRate 0.0062 Epoch: 15 Global Step: 186340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:53:34,997-Speed 3320.55 samples/sec Loss 1.5979 LearningRate 0.0062 Epoch: 15 Global Step: 186350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:53:38,182-Speed 3216.18 samples/sec Loss 1.5778 LearningRate 0.0062 Epoch: 15 Global Step: 186360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:53:41,257-Speed 3330.99 samples/sec Loss 1.5635 LearningRate 0.0062 Epoch: 15 Global Step: 186370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:53:44,381-Speed 3279.64 samples/sec Loss 1.5261 LearningRate 0.0062 Epoch: 15 Global Step: 186380 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:53:47,598-Speed 3183.88 samples/sec Loss 1.5827 LearningRate 0.0062 Epoch: 15 Global Step: 186390 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:53:50,680-Speed 3324.16 samples/sec Loss 1.4979 LearningRate 0.0062 Epoch: 15 Global Step: 186400 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:53:53,787-Speed 3296.97 samples/sec Loss 1.5849 LearningRate 0.0062 Epoch: 15 Global Step: 186410 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:53:56,869-Speed 3323.07 samples/sec Loss 1.5419 LearningRate 0.0062 Epoch: 15 Global Step: 186420 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:54:00,096-Speed 3174.62 samples/sec Loss 1.5539 LearningRate 0.0062 Epoch: 15 Global Step: 186430 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:54:03,327-Speed 3169.68 samples/sec Loss 1.5540 LearningRate 0.0062 Epoch: 15 Global Step: 186440 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:54:06,631-Speed 3100.44 samples/sec Loss 1.5229 LearningRate 0.0062 Epoch: 15 Global Step: 186450 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:54:09,926-Speed 3108.52 samples/sec Loss 1.6084 LearningRate 0.0062 Epoch: 15 Global Step: 186460 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:54:13,019-Speed 3311.72 samples/sec Loss 1.5457 LearningRate 0.0062 Epoch: 15 Global Step: 186470 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:54:16,124-Speed 3299.09 samples/sec Loss 1.5452 LearningRate 0.0062 Epoch: 15 Global Step: 186480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:54:19,219-Speed 3309.79 samples/sec Loss 1.5404 LearningRate 0.0062 Epoch: 15 Global Step: 186490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:54:22,319-Speed 3304.37 samples/sec Loss 1.5666 LearningRate 0.0062 Epoch: 15 Global Step: 186500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:54:25,444-Speed 3278.25 samples/sec Loss 1.5374 LearningRate 0.0062 Epoch: 15 Global Step: 186510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:54:28,573-Speed 3273.19 samples/sec Loss 1.5853 LearningRate 0.0062 Epoch: 15 Global Step: 186520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:54:31,664-Speed 3313.88 samples/sec Loss 1.5919 LearningRate 0.0062 Epoch: 15 Global Step: 186530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:54:34,740-Speed 3329.70 samples/sec Loss 1.5658 LearningRate 0.0062 Epoch: 15 Global Step: 186540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:54:37,844-Speed 3301.11 samples/sec Loss 1.5225 LearningRate 0.0062 Epoch: 15 Global Step: 186550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:54:40,983-Speed 3262.36 samples/sec Loss 1.5348 LearningRate 0.0062 Epoch: 15 Global Step: 186560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:54:44,041-Speed 3349.64 samples/sec Loss 1.5657 LearningRate 0.0062 Epoch: 15 Global Step: 186570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:54:47,129-Speed 3317.78 samples/sec Loss 1.5525 LearningRate 0.0062 Epoch: 15 Global Step: 186580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:54:50,263-Speed 3267.77 samples/sec Loss 1.5634 LearningRate 0.0062 Epoch: 15 Global Step: 186590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:54:53,371-Speed 3295.79 samples/sec Loss 1.5164 LearningRate 0.0062 Epoch: 15 Global Step: 186600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:54:56,462-Speed 3314.54 samples/sec Loss 1.6248 LearningRate 0.0062 Epoch: 15 Global Step: 186610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:54:59,566-Speed 3299.46 samples/sec Loss 1.5790 LearningRate 0.0062 Epoch: 15 Global Step: 186620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:02,757-Speed 3210.51 samples/sec Loss 1.5937 LearningRate 0.0062 Epoch: 15 Global Step: 186630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:05,860-Speed 3301.40 samples/sec Loss 1.5372 LearningRate 0.0062 Epoch: 15 Global Step: 186640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:08,916-Speed 3351.81 samples/sec Loss 1.5665 LearningRate 0.0062 Epoch: 15 Global Step: 186650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:12,034-Speed 3285.16 samples/sec Loss 1.6260 LearningRate 0.0062 Epoch: 15 Global Step: 186660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:15,202-Speed 3232.91 samples/sec Loss 1.6371 LearningRate 0.0062 Epoch: 15 Global Step: 186670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:18,324-Speed 3281.25 samples/sec Loss 1.5620 LearningRate 0.0062 Epoch: 15 Global Step: 186680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:21,378-Speed 3353.79 samples/sec Loss 1.5552 LearningRate 0.0062 Epoch: 15 Global Step: 186690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:24,550-Speed 3229.14 samples/sec Loss 1.5984 LearningRate 0.0062 Epoch: 15 Global Step: 186700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:27,685-Speed 3267.31 samples/sec Loss 1.6126 LearningRate 0.0062 Epoch: 15 Global Step: 186710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:55:30,823-Speed 3265.11 samples/sec Loss 1.6182 LearningRate 0.0062 Epoch: 15 Global Step: 186720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:34,655-Speed 2672.44 samples/sec Loss 1.5964 LearningRate 0.0062 Epoch: 15 Global Step: 186730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:37,778-Speed 3280.09 samples/sec Loss 1.5992 LearningRate 0.0062 Epoch: 15 Global Step: 186740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:40,899-Speed 3281.71 samples/sec Loss 1.5561 LearningRate 0.0062 Epoch: 15 Global Step: 186750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:55:44,026-Speed 3276.34 samples/sec Loss 1.5877 LearningRate 0.0062 Epoch: 15 Global Step: 186760 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:55:47,120-Speed 3309.81 samples/sec Loss 1.5524 LearningRate 0.0062 Epoch: 15 Global Step: 186770 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:55:50,227-Speed 3297.50 samples/sec Loss 1.5722 LearningRate 0.0062 Epoch: 15 Global Step: 186780 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:55:53,321-Speed 3310.42 samples/sec Loss 1.5437 LearningRate 0.0062 Epoch: 15 Global Step: 186790 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:55:56,402-Speed 3324.92 samples/sec Loss 1.5902 LearningRate 0.0062 Epoch: 15 Global Step: 186800 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:55:59,527-Speed 3276.93 samples/sec Loss 1.6010 LearningRate 0.0062 Epoch: 15 Global Step: 186810 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:56:02,671-Speed 3258.81 samples/sec Loss 1.6005 LearningRate 0.0061 Epoch: 15 Global Step: 186820 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:56:05,781-Speed 3293.83 samples/sec Loss 1.5801 LearningRate 0.0061 Epoch: 15 Global Step: 186830 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:56:08,867-Speed 3318.95 samples/sec Loss 1.5726 LearningRate 0.0061 Epoch: 15 Global Step: 186840 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:56:11,949-Speed 3323.63 samples/sec Loss 1.5685 LearningRate 0.0061 Epoch: 15 Global Step: 186850 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:56:15,064-Speed 3288.49 samples/sec Loss 1.5703 LearningRate 0.0061 Epoch: 15 Global Step: 186860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:18,229-Speed 3235.84 samples/sec Loss 1.5348 LearningRate 0.0061 Epoch: 15 Global Step: 186870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:21,330-Speed 3302.98 samples/sec Loss 1.5793 LearningRate 0.0061 Epoch: 15 Global Step: 186880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:24,454-Speed 3278.92 samples/sec Loss 1.5858 LearningRate 0.0061 Epoch: 15 Global Step: 186890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:27,549-Speed 3309.43 samples/sec Loss 1.6092 LearningRate 0.0061 Epoch: 15 Global Step: 186900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:30,652-Speed 3301.55 samples/sec Loss 1.5831 LearningRate 0.0061 Epoch: 15 Global Step: 186910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:33,763-Speed 3292.57 samples/sec Loss 1.6123 LearningRate 0.0061 Epoch: 15 Global Step: 186920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:36,831-Speed 3338.30 samples/sec Loss 1.6561 LearningRate 0.0061 Epoch: 15 Global Step: 186930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:39,931-Speed 3304.75 samples/sec Loss 1.5582 LearningRate 0.0061 Epoch: 15 Global Step: 186940 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:43,051-Speed 3282.74 samples/sec Loss 1.5736 LearningRate 0.0061 Epoch: 15 Global Step: 186950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:46,108-Speed 3351.30 samples/sec Loss 1.5846 LearningRate 0.0061 Epoch: 15 Global Step: 186960 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:49,233-Speed 3277.25 samples/sec Loss 1.5797 LearningRate 0.0061 Epoch: 15 Global Step: 186970 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:52,401-Speed 3233.79 samples/sec Loss 1.5504 LearningRate 0.0061 Epoch: 15 Global Step: 186980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:55,546-Speed 3256.10 samples/sec Loss 1.5815 LearningRate 0.0061 Epoch: 15 Global Step: 186990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:56:58,626-Speed 3325.49 samples/sec Loss 1.6027 LearningRate 0.0061 Epoch: 15 Global Step: 187000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:57:01,731-Speed 3299.37 samples/sec Loss 1.5886 LearningRate 0.0061 Epoch: 15 Global Step: 187010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:57:04,842-Speed 3292.29 samples/sec Loss 1.5496 LearningRate 0.0061 Epoch: 15 Global Step: 187020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:57:07,956-Speed 3289.39 samples/sec Loss 1.5791 LearningRate 0.0061 Epoch: 15 Global Step: 187030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:57:11,080-Speed 3278.37 samples/sec Loss 1.6075 LearningRate 0.0061 Epoch: 15 Global Step: 187040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:57:14,199-Speed 3285.50 samples/sec Loss 1.6063 LearningRate 0.0061 Epoch: 15 Global Step: 187050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:57:17,358-Speed 3242.57 samples/sec Loss 1.5664 LearningRate 0.0061 Epoch: 15 Global Step: 187060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:57:20,482-Speed 3278.07 samples/sec Loss 1.6166 LearningRate 0.0061 Epoch: 15 Global Step: 187070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:57:23,595-Speed 3290.61 samples/sec Loss 1.6414 LearningRate 0.0061 Epoch: 15 Global Step: 187080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:57:26,713-Speed 3285.66 samples/sec Loss 1.6316 LearningRate 0.0061 Epoch: 15 Global Step: 187090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:57:29,827-Speed 3288.53 samples/sec Loss 1.6202 LearningRate 0.0061 Epoch: 15 Global Step: 187100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:57:32,900-Speed 3334.01 samples/sec Loss 1.6102 LearningRate 0.0061 Epoch: 15 Global Step: 187110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:57:36,071-Speed 3230.46 samples/sec Loss 1.5894 LearningRate 0.0061 Epoch: 15 Global Step: 187120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:57:39,146-Speed 3330.18 samples/sec Loss 1.5692 LearningRate 0.0061 Epoch: 15 Global Step: 187130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:57:42,258-Speed 3291.49 samples/sec Loss 1.5919 LearningRate 0.0061 Epoch: 15 Global Step: 187140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:57:45,321-Speed 3344.71 samples/sec Loss 1.5928 LearningRate 0.0061 Epoch: 15 Global Step: 187150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:57:48,413-Speed 3313.02 samples/sec Loss 1.5758 LearningRate 0.0061 Epoch: 15 Global Step: 187160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 17:57:51,506-Speed 3311.83 samples/sec Loss 1.6463 LearningRate 0.0061 Epoch: 15 Global Step: 187170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:57:54,652-Speed 3255.67 samples/sec Loss 1.6057 LearningRate 0.0061 Epoch: 15 Global Step: 187180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:57:57,717-Speed 3341.70 samples/sec Loss 1.6303 LearningRate 0.0061 Epoch: 15 Global Step: 187190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:58:00,815-Speed 3306.50 samples/sec Loss 1.5624 LearningRate 0.0061 Epoch: 15 Global Step: 187200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:58:03,973-Speed 3244.34 samples/sec Loss 1.5583 LearningRate 0.0061 Epoch: 15 Global Step: 187210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:58:07,095-Speed 3280.78 samples/sec Loss 1.6271 LearningRate 0.0061 Epoch: 15 Global Step: 187220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:58:10,173-Speed 3327.78 samples/sec Loss 1.5713 LearningRate 0.0061 Epoch: 15 Global Step: 187230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:58:13,331-Speed 3244.26 samples/sec Loss 1.5481 LearningRate 0.0061 Epoch: 15 Global Step: 187240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:58:16,417-Speed 3319.46 samples/sec Loss 1.6224 LearningRate 0.0061 Epoch: 15 Global Step: 187250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:58:19,511-Speed 3310.35 samples/sec Loss 1.6651 LearningRate 0.0061 Epoch: 15 Global Step: 187260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:58:22,569-Speed 3349.43 samples/sec Loss 1.6018 LearningRate 0.0061 Epoch: 15 Global Step: 187270 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:58:25,645-Speed 3330.34 samples/sec Loss 1.6244 LearningRate 0.0061 Epoch: 15 Global Step: 187280 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:58:28,709-Speed 3342.70 samples/sec Loss 1.6286 LearningRate 0.0061 Epoch: 15 Global Step: 187290 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:58:31,817-Speed 3296.35 samples/sec Loss 1.5836 LearningRate 0.0061 Epoch: 15 Global Step: 187300 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:58:34,876-Speed 3348.43 samples/sec Loss 1.5879 LearningRate 0.0061 Epoch: 15 Global Step: 187310 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:58:38,055-Speed 3221.90 samples/sec Loss 1.5690 LearningRate 0.0060 Epoch: 15 Global Step: 187320 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:58:41,144-Speed 3315.83 samples/sec Loss 1.5925 LearningRate 0.0060 Epoch: 15 Global Step: 187330 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:58:44,250-Speed 3298.99 samples/sec Loss 1.5994 LearningRate 0.0060 Epoch: 15 Global Step: 187340 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:58:47,410-Speed 3241.61 samples/sec Loss 1.6047 LearningRate 0.0060 Epoch: 15 Global Step: 187350 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:58:50,508-Speed 3305.77 samples/sec Loss 1.6312 LearningRate 0.0060 Epoch: 15 Global Step: 187360 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 17:58:53,610-Speed 3301.99 samples/sec Loss 1.5944 LearningRate 0.0060 Epoch: 15 Global Step: 187370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:58:56,686-Speed 3330.17 samples/sec Loss 1.5989 LearningRate 0.0060 Epoch: 15 Global Step: 187380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:58:59,860-Speed 3227.15 samples/sec Loss 1.6471 LearningRate 0.0060 Epoch: 15 Global Step: 187390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:59:02,979-Speed 3284.44 samples/sec Loss 1.5727 LearningRate 0.0060 Epoch: 15 Global Step: 187400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:59:06,090-Speed 3292.87 samples/sec Loss 1.5885 LearningRate 0.0060 Epoch: 15 Global Step: 187410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:59:09,179-Speed 3316.11 samples/sec Loss 1.6374 LearningRate 0.0060 Epoch: 15 Global Step: 187420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:59:12,237-Speed 3350.85 samples/sec Loss 1.6593 LearningRate 0.0060 Epoch: 15 Global Step: 187430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:59:15,315-Speed 3327.78 samples/sec Loss 1.6258 LearningRate 0.0060 Epoch: 15 Global Step: 187440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:59:18,428-Speed 3291.00 samples/sec Loss 1.6035 LearningRate 0.0060 Epoch: 15 Global Step: 187450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:59:21,537-Speed 3294.49 samples/sec Loss 1.6341 LearningRate 0.0060 Epoch: 15 Global Step: 187460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:59:24,637-Speed 3304.00 samples/sec Loss 1.5836 LearningRate 0.0060 Epoch: 15 Global Step: 187470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:59:27,722-Speed 3320.31 samples/sec Loss 1.6423 LearningRate 0.0060 Epoch: 15 Global Step: 187480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:59:30,854-Speed 3270.22 samples/sec Loss 1.6144 LearningRate 0.0060 Epoch: 15 Global Step: 187490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:59:33,951-Speed 3308.04 samples/sec Loss 1.6418 LearningRate 0.0060 Epoch: 15 Global Step: 187500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:59:37,113-Speed 3240.09 samples/sec Loss 1.5749 LearningRate 0.0060 Epoch: 15 Global Step: 187510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:59:40,250-Speed 3264.58 samples/sec Loss 1.5832 LearningRate 0.0060 Epoch: 15 Global Step: 187520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:59:43,335-Speed 3320.54 samples/sec Loss 1.6501 LearningRate 0.0060 Epoch: 15 Global Step: 187530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:59:46,402-Speed 3340.47 samples/sec Loss 1.5697 LearningRate 0.0060 Epoch: 15 Global Step: 187540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 17:59:49,468-Speed 3340.65 samples/sec Loss 1.6430 LearningRate 0.0060 Epoch: 15 Global Step: 187550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:59:52,549-Speed 3324.34 samples/sec Loss 1.5844 LearningRate 0.0060 Epoch: 15 Global Step: 187560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:59:55,632-Speed 3323.13 samples/sec Loss 1.6107 LearningRate 0.0060 Epoch: 15 Global Step: 187570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 17:59:58,687-Speed 3352.81 samples/sec Loss 1.6502 LearningRate 0.0060 Epoch: 15 Global Step: 187580 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:00:01,812-Speed 3277.03 samples/sec Loss 1.6216 LearningRate 0.0060 Epoch: 15 Global Step: 187590 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:00:04,872-Speed 3347.44 samples/sec Loss 1.5963 LearningRate 0.0060 Epoch: 15 Global Step: 187600 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:00:07,971-Speed 3305.97 samples/sec Loss 1.6244 LearningRate 0.0060 Epoch: 15 Global Step: 187610 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:00:11,059-Speed 3316.92 samples/sec Loss 1.6103 LearningRate 0.0060 Epoch: 15 Global Step: 187620 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:00:14,200-Speed 3261.84 samples/sec Loss 1.6029 LearningRate 0.0060 Epoch: 15 Global Step: 187630 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:00:17,324-Speed 3278.40 samples/sec Loss 1.5705 LearningRate 0.0060 Epoch: 15 Global Step: 187640 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:00:20,421-Speed 3307.42 samples/sec Loss 1.6038 LearningRate 0.0060 Epoch: 15 Global Step: 187650 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:00:23,501-Speed 3326.49 samples/sec Loss 1.5838 LearningRate 0.0060 Epoch: 15 Global Step: 187660 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:00:26,651-Speed 3251.07 samples/sec Loss 1.5707 LearningRate 0.0060 Epoch: 15 Global Step: 187670 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:00:29,766-Speed 3288.33 samples/sec Loss 1.6544 LearningRate 0.0060 Epoch: 15 Global Step: 187680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:00:32,834-Speed 3339.11 samples/sec Loss 1.6122 LearningRate 0.0060 Epoch: 15 Global Step: 187690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:00:35,952-Speed 3285.08 samples/sec Loss 1.6381 LearningRate 0.0060 Epoch: 15 Global Step: 187700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:00:39,089-Speed 3265.66 samples/sec Loss 1.6391 LearningRate 0.0060 Epoch: 15 Global Step: 187710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:00:42,170-Speed 3324.15 samples/sec Loss 1.6065 LearningRate 0.0060 Epoch: 15 Global Step: 187720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:00:45,274-Speed 3300.02 samples/sec Loss 1.6245 LearningRate 0.0060 Epoch: 15 Global Step: 187730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:00:48,420-Speed 3255.62 samples/sec Loss 1.6475 LearningRate 0.0060 Epoch: 15 Global Step: 187740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:00:51,506-Speed 3319.06 samples/sec Loss 1.6187 LearningRate 0.0060 Epoch: 15 Global Step: 187750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:00:54,563-Speed 3350.96 samples/sec Loss 1.6161 LearningRate 0.0060 Epoch: 15 Global Step: 187760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:00:57,633-Speed 3340.68 samples/sec Loss 1.5622 LearningRate 0.0060 Epoch: 15 Global Step: 187770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:00,736-Speed 3300.72 samples/sec Loss 1.6169 LearningRate 0.0060 Epoch: 15 Global Step: 187780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:01:03,823-Speed 3317.93 samples/sec Loss 1.6057 LearningRate 0.0060 Epoch: 15 Global Step: 187790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:01:06,943-Speed 3282.80 samples/sec Loss 1.6805 LearningRate 0.0060 Epoch: 15 Global Step: 187800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:09,991-Speed 3360.88 samples/sec Loss 1.6578 LearningRate 0.0060 Epoch: 15 Global Step: 187810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:13,055-Speed 3343.77 samples/sec Loss 1.6213 LearningRate 0.0060 Epoch: 15 Global Step: 187820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:16,103-Speed 3360.50 samples/sec Loss 1.5963 LearningRate 0.0059 Epoch: 15 Global Step: 187830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:19,173-Speed 3336.45 samples/sec Loss 1.6978 LearningRate 0.0059 Epoch: 15 Global Step: 187840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:22,251-Speed 3328.25 samples/sec Loss 1.6303 LearningRate 0.0059 Epoch: 15 Global Step: 187850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:25,446-Speed 3205.83 samples/sec Loss 1.6032 LearningRate 0.0059 Epoch: 15 Global Step: 187860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:28,658-Speed 3189.06 samples/sec Loss 1.6241 LearningRate 0.0059 Epoch: 15 Global Step: 187870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:31,787-Speed 3273.59 samples/sec Loss 1.6707 LearningRate 0.0059 Epoch: 15 Global Step: 187880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:34,927-Speed 3261.57 samples/sec Loss 1.6373 LearningRate 0.0059 Epoch: 15 Global Step: 187890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:38,076-Speed 3253.69 samples/sec Loss 1.6506 LearningRate 0.0059 Epoch: 15 Global Step: 187900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:01:41,182-Speed 3297.22 samples/sec Loss 1.6631 LearningRate 0.0059 Epoch: 15 Global Step: 187910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:01:44,251-Speed 3338.45 samples/sec Loss 1.6119 LearningRate 0.0059 Epoch: 15 Global Step: 187920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:01:47,310-Speed 3348.05 samples/sec Loss 1.6806 LearningRate 0.0059 Epoch: 15 Global Step: 187930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:50,400-Speed 3315.20 samples/sec Loss 1.6721 LearningRate 0.0059 Epoch: 15 Global Step: 187940 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:53,555-Speed 3246.22 samples/sec Loss 1.6434 LearningRate 0.0059 Epoch: 15 Global Step: 187950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:56,678-Speed 3279.82 samples/sec Loss 1.6213 LearningRate 0.0059 Epoch: 15 Global Step: 187960 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:01:59,791-Speed 3291.15 samples/sec Loss 1.6706 LearningRate 0.0059 Epoch: 15 Global Step: 187970 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:02:02,949-Speed 3243.37 samples/sec Loss 1.6791 LearningRate 0.0059 Epoch: 15 Global Step: 187980 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:06,144-Speed 3205.91 samples/sec Loss 1.6211 LearningRate 0.0059 Epoch: 15 Global Step: 187990 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:09,224-Speed 3325.10 samples/sec Loss 1.5932 LearningRate 0.0059 Epoch: 15 Global Step: 188000 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:12,341-Speed 3286.25 samples/sec Loss 1.5813 LearningRate 0.0059 Epoch: 15 Global Step: 188010 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:15,576-Speed 3167.04 samples/sec Loss 1.6216 LearningRate 0.0059 Epoch: 15 Global Step: 188020 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:18,629-Speed 3354.78 samples/sec Loss 1.6124 LearningRate 0.0059 Epoch: 15 Global Step: 188030 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:21,689-Speed 3347.44 samples/sec Loss 1.5972 LearningRate 0.0059 Epoch: 15 Global Step: 188040 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:24,788-Speed 3305.90 samples/sec Loss 1.6914 LearningRate 0.0059 Epoch: 15 Global Step: 188050 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:27,852-Speed 3342.52 samples/sec Loss 1.6892 LearningRate 0.0059 Epoch: 15 Global Step: 188060 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:30,962-Speed 3294.33 samples/sec Loss 1.5786 LearningRate 0.0059 Epoch: 15 Global Step: 188070 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:34,057-Speed 3309.42 samples/sec Loss 1.6177 LearningRate 0.0059 Epoch: 15 Global Step: 188080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:02:37,141-Speed 3321.55 samples/sec Loss 1.6479 LearningRate 0.0059 Epoch: 15 Global Step: 188090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:02:40,236-Speed 3308.89 samples/sec Loss 1.6413 LearningRate 0.0059 Epoch: 15 Global Step: 188100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:02:43,349-Speed 3291.00 samples/sec Loss 1.6510 LearningRate 0.0059 Epoch: 15 Global Step: 188110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:02:46,409-Speed 3348.78 samples/sec Loss 1.6421 LearningRate 0.0059 Epoch: 15 Global Step: 188120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:02:49,475-Speed 3340.51 samples/sec Loss 1.6960 LearningRate 0.0059 Epoch: 15 Global Step: 188130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:02:52,555-Speed 3326.05 samples/sec Loss 1.6070 LearningRate 0.0059 Epoch: 15 Global Step: 188140 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:55,667-Speed 3292.04 samples/sec Loss 1.6099 LearningRate 0.0059 Epoch: 15 Global Step: 188150 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:02:58,754-Speed 3318.20 samples/sec Loss 1.5885 LearningRate 0.0059 Epoch: 15 Global Step: 188160 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:03:01,898-Speed 3257.71 samples/sec Loss 1.6861 LearningRate 0.0059 Epoch: 15 Global Step: 188170 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:03:04,958-Speed 3347.48 samples/sec Loss 1.6476 LearningRate 0.0059 Epoch: 15 Global Step: 188180 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:03:08,067-Speed 3295.27 samples/sec Loss 1.6881 LearningRate 0.0059 Epoch: 15 Global Step: 188190 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:03:11,169-Speed 3301.92 samples/sec Loss 1.6399 LearningRate 0.0059 Epoch: 15 Global Step: 188200 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:03:14,286-Speed 3285.79 samples/sec Loss 1.6185 LearningRate 0.0059 Epoch: 15 Global Step: 188210 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:03:17,420-Speed 3269.24 samples/sec Loss 1.6181 LearningRate 0.0059 Epoch: 15 Global Step: 188220 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:03:20,498-Speed 3327.64 samples/sec Loss 1.6254 LearningRate 0.0059 Epoch: 15 Global Step: 188230 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:03:23,573-Speed 3331.17 samples/sec Loss 1.6381 LearningRate 0.0059 Epoch: 15 Global Step: 188240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:03:26,737-Speed 3237.36 samples/sec Loss 1.6662 LearningRate 0.0059 Epoch: 15 Global Step: 188250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:03:29,824-Speed 3318.13 samples/sec Loss 1.6181 LearningRate 0.0059 Epoch: 15 Global Step: 188260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:03:32,940-Speed 3287.71 samples/sec Loss 1.6247 LearningRate 0.0059 Epoch: 15 Global Step: 188270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:03:36,044-Speed 3299.93 samples/sec Loss 1.6248 LearningRate 0.0059 Epoch: 15 Global Step: 188280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:03:39,283-Speed 3162.97 samples/sec Loss 1.6893 LearningRate 0.0059 Epoch: 15 Global Step: 188290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:03:42,372-Speed 3316.24 samples/sec Loss 1.6897 LearningRate 0.0059 Epoch: 15 Global Step: 188300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:03:45,463-Speed 3313.45 samples/sec Loss 1.7161 LearningRate 0.0059 Epoch: 15 Global Step: 188310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:03:48,559-Speed 3308.58 samples/sec Loss 1.6613 LearningRate 0.0059 Epoch: 15 Global Step: 188320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:03:51,689-Speed 3273.10 samples/sec Loss 1.6817 LearningRate 0.0059 Epoch: 15 Global Step: 188330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:03:54,817-Speed 3275.03 samples/sec Loss 1.6616 LearningRate 0.0058 Epoch: 15 Global Step: 188340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:03:57,890-Speed 3333.21 samples/sec Loss 1.6016 LearningRate 0.0058 Epoch: 15 Global Step: 188350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:04:00,981-Speed 3313.68 samples/sec Loss 1.6869 LearningRate 0.0058 Epoch: 15 Global Step: 188360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:04:04,100-Speed 3284.56 samples/sec Loss 1.6665 LearningRate 0.0058 Epoch: 15 Global Step: 188370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:04:07,202-Speed 3301.85 samples/sec Loss 1.6434 LearningRate 0.0058 Epoch: 15 Global Step: 188380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:04:10,282-Speed 3325.60 samples/sec Loss 1.6461 LearningRate 0.0058 Epoch: 15 Global Step: 188390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:04:13,428-Speed 3256.66 samples/sec Loss 1.6415 LearningRate 0.0058 Epoch: 15 Global Step: 188400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:04:16,558-Speed 3272.42 samples/sec Loss 1.7215 LearningRate 0.0058 Epoch: 15 Global Step: 188410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:04:19,696-Speed 3264.93 samples/sec Loss 1.6370 LearningRate 0.0058 Epoch: 15 Global Step: 188420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:04:22,770-Speed 3331.98 samples/sec Loss 1.6919 LearningRate 0.0058 Epoch: 15 Global Step: 188430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:04:25,841-Speed 3335.00 samples/sec Loss 1.6509 LearningRate 0.0058 Epoch: 15 Global Step: 188440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:04:29,006-Speed 3236.52 samples/sec Loss 1.6703 LearningRate 0.0058 Epoch: 15 Global Step: 188450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:04:32,084-Speed 3328.31 samples/sec Loss 1.5892 LearningRate 0.0058 Epoch: 15 Global Step: 188460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:04:35,163-Speed 3327.20 samples/sec Loss 1.6652 LearningRate 0.0058 Epoch: 15 Global Step: 188470 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:04:38,256-Speed 3311.35 samples/sec Loss 1.6529 LearningRate 0.0058 Epoch: 15 Global Step: 188480 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:04:41,406-Speed 3251.97 samples/sec Loss 1.6245 LearningRate 0.0058 Epoch: 15 Global Step: 188490 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:04:44,474-Speed 3339.30 samples/sec Loss 1.6519 LearningRate 0.0058 Epoch: 15 Global Step: 188500 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:04:47,576-Speed 3301.75 samples/sec Loss 1.6243 LearningRate 0.0058 Epoch: 15 Global Step: 188510 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:04:50,721-Speed 3257.37 samples/sec Loss 1.6251 LearningRate 0.0058 Epoch: 15 Global Step: 188520 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:04:53,942-Speed 3180.17 samples/sec Loss 1.7106 LearningRate 0.0058 Epoch: 15 Global Step: 188530 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:04:57,066-Speed 3278.84 samples/sec Loss 1.6818 LearningRate 0.0058 Epoch: 15 Global Step: 188540 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:05:00,142-Speed 3330.20 samples/sec Loss 1.6477 LearningRate 0.0058 Epoch: 15 Global Step: 188550 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:05:03,307-Speed 3236.08 samples/sec Loss 1.6287 LearningRate 0.0058 Epoch: 15 Global Step: 188560 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:05:06,449-Speed 3260.44 samples/sec Loss 1.6294 LearningRate 0.0058 Epoch: 15 Global Step: 188570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:09,509-Speed 3347.21 samples/sec Loss 1.6782 LearningRate 0.0058 Epoch: 15 Global Step: 188580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:12,591-Speed 3323.60 samples/sec Loss 1.7033 LearningRate 0.0058 Epoch: 15 Global Step: 188590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:15,700-Speed 3294.72 samples/sec Loss 1.6170 LearningRate 0.0058 Epoch: 15 Global Step: 188600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:18,855-Speed 3246.64 samples/sec Loss 1.6528 LearningRate 0.0058 Epoch: 15 Global Step: 188610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:21,904-Speed 3359.98 samples/sec Loss 1.6497 LearningRate 0.0058 Epoch: 15 Global Step: 188620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:25,011-Speed 3297.45 samples/sec Loss 1.6813 LearningRate 0.0058 Epoch: 15 Global Step: 188630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:28,127-Speed 3287.20 samples/sec Loss 1.6384 LearningRate 0.0058 Epoch: 15 Global Step: 188640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:31,268-Speed 3261.10 samples/sec Loss 1.6368 LearningRate 0.0058 Epoch: 15 Global Step: 188650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:34,325-Speed 3350.07 samples/sec Loss 1.6238 LearningRate 0.0058 Epoch: 15 Global Step: 188660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:37,444-Speed 3284.18 samples/sec Loss 1.6267 LearningRate 0.0058 Epoch: 15 Global Step: 188670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:05:40,502-Speed 3349.60 samples/sec Loss 1.6614 LearningRate 0.0058 Epoch: 15 Global Step: 188680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:43,607-Speed 3299.80 samples/sec Loss 1.6429 LearningRate 0.0058 Epoch: 15 Global Step: 188690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:46,705-Speed 3306.30 samples/sec Loss 1.6394 LearningRate 0.0058 Epoch: 15 Global Step: 188700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:49,854-Speed 3252.94 samples/sec Loss 1.6631 LearningRate 0.0058 Epoch: 15 Global Step: 188710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:52,940-Speed 3318.72 samples/sec Loss 1.7188 LearningRate 0.0058 Epoch: 15 Global Step: 188720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:56,001-Speed 3346.41 samples/sec Loss 1.6654 LearningRate 0.0058 Epoch: 15 Global Step: 188730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:05:59,075-Speed 3332.26 samples/sec Loss 1.6805 LearningRate 0.0058 Epoch: 15 Global Step: 188740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:02,220-Speed 3257.37 samples/sec Loss 1.7008 LearningRate 0.0058 Epoch: 15 Global Step: 188750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:05,293-Speed 3333.05 samples/sec Loss 1.6912 LearningRate 0.0058 Epoch: 15 Global Step: 188760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:08,398-Speed 3299.18 samples/sec Loss 1.6517 LearningRate 0.0058 Epoch: 15 Global Step: 188770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:11,467-Speed 3337.12 samples/sec Loss 1.6077 LearningRate 0.0058 Epoch: 15 Global Step: 188780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:06:14,536-Speed 3337.71 samples/sec Loss 1.6757 LearningRate 0.0058 Epoch: 15 Global Step: 188790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:06:17,645-Speed 3294.62 samples/sec Loss 1.7161 LearningRate 0.0058 Epoch: 15 Global Step: 188800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:06:20,699-Speed 3353.92 samples/sec Loss 1.6293 LearningRate 0.0058 Epoch: 15 Global Step: 188810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:06:23,770-Speed 3335.68 samples/sec Loss 1.6917 LearningRate 0.0058 Epoch: 15 Global Step: 188820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:06:26,809-Speed 3370.22 samples/sec Loss 1.5846 LearningRate 0.0058 Epoch: 15 Global Step: 188830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:29,886-Speed 3329.76 samples/sec Loss 1.6946 LearningRate 0.0058 Epoch: 15 Global Step: 188840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:32,955-Speed 3337.21 samples/sec Loss 1.6930 LearningRate 0.0058 Epoch: 15 Global Step: 188850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:36,074-Speed 3284.77 samples/sec Loss 1.6964 LearningRate 0.0057 Epoch: 15 Global Step: 188860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:39,148-Speed 3332.59 samples/sec Loss 1.6881 LearningRate 0.0057 Epoch: 15 Global Step: 188870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:42,248-Speed 3303.97 samples/sec Loss 1.6561 LearningRate 0.0057 Epoch: 15 Global Step: 188880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:45,295-Speed 3361.68 samples/sec Loss 1.6731 LearningRate 0.0057 Epoch: 15 Global Step: 188890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:48,453-Speed 3243.49 samples/sec Loss 1.6765 LearningRate 0.0057 Epoch: 15 Global Step: 188900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:51,565-Speed 3292.02 samples/sec Loss 1.7011 LearningRate 0.0057 Epoch: 15 Global Step: 188910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:54,646-Speed 3324.48 samples/sec Loss 1.7274 LearningRate 0.0057 Epoch: 15 Global Step: 188920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:06:57,700-Speed 3354.39 samples/sec Loss 1.7490 LearningRate 0.0057 Epoch: 15 Global Step: 188930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:07:00,817-Speed 3286.17 samples/sec Loss 1.7169 LearningRate 0.0057 Epoch: 15 Global Step: 188940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:07:03,900-Speed 3322.24 samples/sec Loss 1.6748 LearningRate 0.0057 Epoch: 15 Global Step: 188950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:07:07,019-Speed 3284.25 samples/sec Loss 1.6823 LearningRate 0.0057 Epoch: 15 Global Step: 188960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:07:10,080-Speed 3346.93 samples/sec Loss 1.6708 LearningRate 0.0057 Epoch: 15 Global Step: 188970 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:13,153-Speed 3332.60 samples/sec Loss 1.6639 LearningRate 0.0057 Epoch: 15 Global Step: 188980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:16,243-Speed 3315.51 samples/sec Loss 1.6847 LearningRate 0.0057 Epoch: 15 Global Step: 188990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:19,335-Speed 3312.26 samples/sec Loss 1.6819 LearningRate 0.0057 Epoch: 15 Global Step: 189000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:22,456-Speed 3282.35 samples/sec Loss 1.6573 LearningRate 0.0057 Epoch: 15 Global Step: 189010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:25,576-Speed 3283.03 samples/sec Loss 1.7037 LearningRate 0.0057 Epoch: 15 Global Step: 189020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:28,682-Speed 3298.40 samples/sec Loss 1.6570 LearningRate 0.0057 Epoch: 15 Global Step: 189030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:31,801-Speed 3283.90 samples/sec Loss 1.6482 LearningRate 0.0057 Epoch: 15 Global Step: 189040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:34,876-Speed 3330.74 samples/sec Loss 1.6452 LearningRate 0.0057 Epoch: 15 Global Step: 189050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:37,970-Speed 3310.95 samples/sec Loss 1.6831 LearningRate 0.0057 Epoch: 15 Global Step: 189060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:41,109-Speed 3263.19 samples/sec Loss 1.6835 LearningRate 0.0057 Epoch: 15 Global Step: 189070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:07:44,232-Speed 3280.18 samples/sec Loss 1.6567 LearningRate 0.0057 Epoch: 15 Global Step: 189080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:47,287-Speed 3353.71 samples/sec Loss 1.7417 LearningRate 0.0057 Epoch: 15 Global Step: 189090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:50,413-Speed 3276.26 samples/sec Loss 1.6845 LearningRate 0.0057 Epoch: 15 Global Step: 189100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:53,510-Speed 3307.49 samples/sec Loss 1.6561 LearningRate 0.0057 Epoch: 15 Global Step: 189110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:56,643-Speed 3270.06 samples/sec Loss 1.7033 LearningRate 0.0057 Epoch: 15 Global Step: 189120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:07:59,728-Speed 3319.49 samples/sec Loss 1.6513 LearningRate 0.0057 Epoch: 15 Global Step: 189130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:08:02,854-Speed 3277.63 samples/sec Loss 1.7001 LearningRate 0.0057 Epoch: 15 Global Step: 189140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:08:05,931-Speed 3328.83 samples/sec Loss 1.6479 LearningRate 0.0057 Epoch: 15 Global Step: 189150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:08:08,980-Speed 3359.11 samples/sec Loss 1.6405 LearningRate 0.0057 Epoch: 15 Global Step: 189160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:08:12,165-Speed 3216.42 samples/sec Loss 1.6729 LearningRate 0.0057 Epoch: 15 Global Step: 189170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:08:15,212-Speed 3361.34 samples/sec Loss 1.6437 LearningRate 0.0057 Epoch: 15 Global Step: 189180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:08:18,333-Speed 3282.64 samples/sec Loss 1.6862 LearningRate 0.0057 Epoch: 15 Global Step: 189190 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:08:21,419-Speed 3318.85 samples/sec Loss 1.6780 LearningRate 0.0057 Epoch: 15 Global Step: 189200 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:08:24,510-Speed 3313.23 samples/sec Loss 1.6475 LearningRate 0.0057 Epoch: 15 Global Step: 189210 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:08:27,633-Speed 3280.91 samples/sec Loss 1.6505 LearningRate 0.0057 Epoch: 15 Global Step: 189220 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:08:30,723-Speed 3314.75 samples/sec Loss 1.7635 LearningRate 0.0057 Epoch: 15 Global Step: 189230 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:08:33,809-Speed 3318.59 samples/sec Loss 1.6331 LearningRate 0.0057 Epoch: 15 Global Step: 189240 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:08:36,910-Speed 3303.46 samples/sec Loss 1.6562 LearningRate 0.0057 Epoch: 15 Global Step: 189250 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:08:40,043-Speed 3269.09 samples/sec Loss 1.6704 LearningRate 0.0057 Epoch: 15 Global Step: 189260 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:08:43,212-Speed 3232.95 samples/sec Loss 1.6571 LearningRate 0.0057 Epoch: 15 Global Step: 189270 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:08:46,298-Speed 3319.85 samples/sec Loss 1.6483 LearningRate 0.0057 Epoch: 15 Global Step: 189280 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:08:49,531-Speed 3167.98 samples/sec Loss 1.6634 LearningRate 0.0057 Epoch: 15 Global Step: 189290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:08:52,630-Speed 3304.89 samples/sec Loss 1.6767 LearningRate 0.0057 Epoch: 15 Global Step: 189300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:08:55,727-Speed 3307.99 samples/sec Loss 1.6968 LearningRate 0.0057 Epoch: 15 Global Step: 189310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:08:58,785-Speed 3349.27 samples/sec Loss 1.6381 LearningRate 0.0057 Epoch: 15 Global Step: 189320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:01,887-Speed 3301.55 samples/sec Loss 1.7250 LearningRate 0.0057 Epoch: 15 Global Step: 189330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:05,039-Speed 3250.47 samples/sec Loss 1.6994 LearningRate 0.0057 Epoch: 15 Global Step: 189340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:08,112-Speed 3333.30 samples/sec Loss 1.6666 LearningRate 0.0057 Epoch: 15 Global Step: 189350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:11,193-Speed 3324.69 samples/sec Loss 1.7144 LearningRate 0.0057 Epoch: 15 Global Step: 189360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:14,307-Speed 3288.49 samples/sec Loss 1.6564 LearningRate 0.0057 Epoch: 15 Global Step: 189370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:17,380-Speed 3333.72 samples/sec Loss 1.6802 LearningRate 0.0056 Epoch: 15 Global Step: 189380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:20,523-Speed 3259.81 samples/sec Loss 1.6445 LearningRate 0.0056 Epoch: 15 Global Step: 189390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:09:23,597-Speed 3331.43 samples/sec Loss 1.6318 LearningRate 0.0056 Epoch: 15 Global Step: 189400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:09:26,680-Speed 3323.18 samples/sec Loss 1.6692 LearningRate 0.0056 Epoch: 15 Global Step: 189410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:09:29,764-Speed 3320.42 samples/sec Loss 1.6932 LearningRate 0.0056 Epoch: 15 Global Step: 189420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:09:32,819-Speed 3354.29 samples/sec Loss 1.6222 LearningRate 0.0056 Epoch: 15 Global Step: 189430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:09:35,889-Speed 3336.55 samples/sec Loss 1.7544 LearningRate 0.0056 Epoch: 15 Global Step: 189440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:09:38,969-Speed 3325.34 samples/sec Loss 1.7084 LearningRate 0.0056 Epoch: 15 Global Step: 189450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:09:42,175-Speed 3194.56 samples/sec Loss 1.6687 LearningRate 0.0056 Epoch: 15 Global Step: 189460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:45,293-Speed 3285.40 samples/sec Loss 1.6332 LearningRate 0.0056 Epoch: 15 Global Step: 189470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:48,434-Speed 3261.09 samples/sec Loss 1.7205 LearningRate 0.0056 Epoch: 15 Global Step: 189480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:51,541-Speed 3296.67 samples/sec Loss 1.6706 LearningRate 0.0056 Epoch: 15 Global Step: 189490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:54,734-Speed 3208.22 samples/sec Loss 1.6567 LearningRate 0.0056 Epoch: 15 Global Step: 189500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:09:57,790-Speed 3351.89 samples/sec Loss 1.6764 LearningRate 0.0056 Epoch: 15 Global Step: 189510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:10:00,885-Speed 3308.91 samples/sec Loss 1.7209 LearningRate 0.0056 Epoch: 15 Global Step: 189520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:10:03,976-Speed 3314.45 samples/sec Loss 1.6753 LearningRate 0.0056 Epoch: 15 Global Step: 189530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:10:07,109-Speed 3269.39 samples/sec Loss 1.7258 LearningRate 0.0056 Epoch: 15 Global Step: 189540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:10:10,176-Speed 3339.69 samples/sec Loss 1.6470 LearningRate 0.0056 Epoch: 15 Global Step: 189550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:10:13,281-Speed 3299.25 samples/sec Loss 1.6643 LearningRate 0.0056 Epoch: 15 Global Step: 189560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:16,403-Speed 3280.72 samples/sec Loss 1.7231 LearningRate 0.0056 Epoch: 15 Global Step: 189570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:19,513-Speed 3294.38 samples/sec Loss 1.6885 LearningRate 0.0056 Epoch: 15 Global Step: 189580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:22,565-Speed 3355.49 samples/sec Loss 1.7302 LearningRate 0.0056 Epoch: 15 Global Step: 189590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:25,645-Speed 3325.55 samples/sec Loss 1.7179 LearningRate 0.0056 Epoch: 15 Global Step: 189600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:28,765-Speed 3283.31 samples/sec Loss 1.6694 LearningRate 0.0056 Epoch: 15 Global Step: 189610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:31,826-Speed 3346.68 samples/sec Loss 1.7027 LearningRate 0.0056 Epoch: 15 Global Step: 189620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:34,904-Speed 3327.88 samples/sec Loss 1.7167 LearningRate 0.0056 Epoch: 15 Global Step: 189630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:37,989-Speed 3320.00 samples/sec Loss 1.6126 LearningRate 0.0056 Epoch: 15 Global Step: 189640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:41,087-Speed 3306.87 samples/sec Loss 1.6995 LearningRate 0.0056 Epoch: 15 Global Step: 189650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:44,173-Speed 3318.78 samples/sec Loss 1.7220 LearningRate 0.0056 Epoch: 15 Global Step: 189660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 18:10:47,245-Speed 3334.68 samples/sec Loss 1.6765 LearningRate 0.0056 Epoch: 15 Global Step: 189670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:50,318-Speed 3333.94 samples/sec Loss 1.7278 LearningRate 0.0056 Epoch: 15 Global Step: 189680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:53,402-Speed 3320.46 samples/sec Loss 1.6723 LearningRate 0.0056 Epoch: 15 Global Step: 189690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:56,507-Speed 3299.21 samples/sec Loss 1.6669 LearningRate 0.0056 Epoch: 15 Global Step: 189700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:10:59,619-Speed 3291.52 samples/sec Loss 1.7033 LearningRate 0.0056 Epoch: 15 Global Step: 189710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:02,758-Speed 3263.64 samples/sec Loss 1.6949 LearningRate 0.0056 Epoch: 15 Global Step: 189720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:05,911-Speed 3248.33 samples/sec Loss 1.6630 LearningRate 0.0056 Epoch: 15 Global Step: 189730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:08,979-Speed 3338.73 samples/sec Loss 1.7134 LearningRate 0.0056 Epoch: 15 Global Step: 189740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:12,050-Speed 3335.84 samples/sec Loss 1.6735 LearningRate 0.0056 Epoch: 15 Global Step: 189750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:15,124-Speed 3332.50 samples/sec Loss 1.6551 LearningRate 0.0056 Epoch: 15 Global Step: 189760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:18,232-Speed 3295.58 samples/sec Loss 1.7559 LearningRate 0.0056 Epoch: 15 Global Step: 189770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:21,323-Speed 3313.59 samples/sec Loss 1.7472 LearningRate 0.0056 Epoch: 15 Global Step: 189780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:24,439-Speed 3287.04 samples/sec Loss 1.6546 LearningRate 0.0056 Epoch: 15 Global Step: 189790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:27,565-Speed 3277.28 samples/sec Loss 1.7278 LearningRate 0.0056 Epoch: 15 Global Step: 189800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:30,675-Speed 3293.78 samples/sec Loss 1.7449 LearningRate 0.0056 Epoch: 15 Global Step: 189810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:11:33,746-Speed 3334.62 samples/sec Loss 1.7335 LearningRate 0.0056 Epoch: 15 Global Step: 189820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:36,916-Speed 3231.59 samples/sec Loss 1.6302 LearningRate 0.0056 Epoch: 15 Global Step: 189830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:40,022-Speed 3297.90 samples/sec Loss 1.7112 LearningRate 0.0056 Epoch: 15 Global Step: 189840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:43,087-Speed 3342.43 samples/sec Loss 1.7223 LearningRate 0.0056 Epoch: 15 Global Step: 189850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:46,249-Speed 3239.42 samples/sec Loss 1.7034 LearningRate 0.0056 Epoch: 15 Global Step: 189860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:49,329-Speed 3325.32 samples/sec Loss 1.6687 LearningRate 0.0056 Epoch: 15 Global Step: 189870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:11:52,436-Speed 3296.98 samples/sec Loss 1.7622 LearningRate 0.0056 Epoch: 15 Global Step: 189880 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:11:55,551-Speed 3288.06 samples/sec Loss 1.6449 LearningRate 0.0056 Epoch: 15 Global Step: 189890 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:11:58,675-Speed 3278.30 samples/sec Loss 1.6754 LearningRate 0.0055 Epoch: 15 Global Step: 189900 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:12:01,822-Speed 3255.88 samples/sec Loss 1.7331 LearningRate 0.0055 Epoch: 15 Global Step: 189910 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:12:04,949-Speed 3274.85 samples/sec Loss 1.7313 LearningRate 0.0055 Epoch: 15 Global Step: 189920 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:12:08,015-Speed 3341.12 samples/sec Loss 1.7484 LearningRate 0.0055 Epoch: 15 Global Step: 189930 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:12:11,124-Speed 3294.66 samples/sec Loss 1.7217 LearningRate 0.0055 Epoch: 15 Global Step: 189940 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:12:14,240-Speed 3288.21 samples/sec Loss 1.7723 LearningRate 0.0055 Epoch: 15 Global Step: 189950 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:12:17,322-Speed 3323.36 samples/sec Loss 1.7970 LearningRate 0.0055 Epoch: 15 Global Step: 189960 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:12:20,404-Speed 3323.70 samples/sec Loss 1.7551 LearningRate 0.0055 Epoch: 15 Global Step: 189970 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:12:23,482-Speed 3328.27 samples/sec Loss 1.7292 LearningRate 0.0055 Epoch: 15 Global Step: 189980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:12:26,586-Speed 3300.23 samples/sec Loss 1.6555 LearningRate 0.0055 Epoch: 15 Global Step: 189990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:12:29,709-Speed 3279.69 samples/sec Loss 1.7029 LearningRate 0.0055 Epoch: 15 Global Step: 190000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:12:32,811-Speed 3301.48 samples/sec Loss 1.6750 LearningRate 0.0055 Epoch: 15 Global Step: 190010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:12:35,935-Speed 3279.27 samples/sec Loss 1.7132 LearningRate 0.0055 Epoch: 15 Global Step: 190020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:12:39,024-Speed 3315.56 samples/sec Loss 1.7076 LearningRate 0.0055 Epoch: 15 Global Step: 190030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:12:42,136-Speed 3291.71 samples/sec Loss 1.6692 LearningRate 0.0055 Epoch: 15 Global Step: 190040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:12:45,216-Speed 3325.93 samples/sec Loss 1.7231 LearningRate 0.0055 Epoch: 15 Global Step: 190050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:12:48,292-Speed 3330.42 samples/sec Loss 1.6471 LearningRate 0.0055 Epoch: 15 Global Step: 190060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:12:51,363-Speed 3335.04 samples/sec Loss 1.7512 LearningRate 0.0055 Epoch: 15 Global Step: 190070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:12:54,474-Speed 3293.27 samples/sec Loss 1.6975 LearningRate 0.0055 Epoch: 15 Global Step: 190080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:12:57,542-Speed 3338.51 samples/sec Loss 1.7408 LearningRate 0.0055 Epoch: 15 Global Step: 190090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:13:00,631-Speed 3315.48 samples/sec Loss 1.7083 LearningRate 0.0055 Epoch: 15 Global Step: 190100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:13:03,734-Speed 3301.11 samples/sec Loss 1.6591 LearningRate 0.0055 Epoch: 15 Global Step: 190110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:13:06,817-Speed 3322.45 samples/sec Loss 1.6639 LearningRate 0.0055 Epoch: 15 Global Step: 190120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:13:09,899-Speed 3323.77 samples/sec Loss 1.7216 LearningRate 0.0055 Epoch: 15 Global Step: 190130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:13:12,986-Speed 3318.25 samples/sec Loss 1.7468 LearningRate 0.0055 Epoch: 15 Global Step: 190140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:13:16,085-Speed 3304.41 samples/sec Loss 1.7301 LearningRate 0.0055 Epoch: 15 Global Step: 190150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:13:19,195-Speed 3294.55 samples/sec Loss 1.6660 LearningRate 0.0055 Epoch: 15 Global Step: 190160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:13:22,281-Speed 3318.54 samples/sec Loss 1.7544 LearningRate 0.0055 Epoch: 15 Global Step: 190170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:13:25,417-Speed 3266.19 samples/sec Loss 1.7268 LearningRate 0.0055 Epoch: 15 Global Step: 190180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:13:28,607-Speed 3211.61 samples/sec Loss 1.7158 LearningRate 0.0055 Epoch: 15 Global Step: 190190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:13:31,734-Speed 3275.90 samples/sec Loss 1.7522 LearningRate 0.0055 Epoch: 15 Global Step: 190200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:13:34,814-Speed 3325.57 samples/sec Loss 1.7417 LearningRate 0.0055 Epoch: 15 Global Step: 190210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:13:37,908-Speed 3310.41 samples/sec Loss 1.6834 LearningRate 0.0055 Epoch: 15 Global Step: 190220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:13:40,994-Speed 3318.37 samples/sec Loss 1.6993 LearningRate 0.0055 Epoch: 15 Global Step: 190230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:13:44,100-Speed 3298.56 samples/sec Loss 1.7450 LearningRate 0.0055 Epoch: 15 Global Step: 190240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:13:47,183-Speed 3322.51 samples/sec Loss 1.7382 LearningRate 0.0055 Epoch: 15 Global Step: 190250 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:13:50,277-Speed 3309.89 samples/sec Loss 1.7066 LearningRate 0.0055 Epoch: 15 Global Step: 190260 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:13:53,465-Speed 3214.40 samples/sec Loss 1.6962 LearningRate 0.0055 Epoch: 15 Global Step: 190270 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:13:56,581-Speed 3286.55 samples/sec Loss 1.7128 LearningRate 0.0055 Epoch: 15 Global Step: 190280 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:13:59,683-Speed 3302.54 samples/sec Loss 1.7378 LearningRate 0.0055 Epoch: 15 Global Step: 190290 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:14:02,787-Speed 3299.41 samples/sec Loss 1.7063 LearningRate 0.0055 Epoch: 15 Global Step: 190300 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:14:05,870-Speed 3322.45 samples/sec Loss 1.6615 LearningRate 0.0055 Epoch: 15 Global Step: 190310 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:14:08,984-Speed 3290.15 samples/sec Loss 1.7688 LearningRate 0.0055 Epoch: 15 Global Step: 190320 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:14:12,139-Speed 3246.58 samples/sec Loss 1.7356 LearningRate 0.0055 Epoch: 15 Global Step: 190330 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:14:15,350-Speed 3189.25 samples/sec Loss 1.6968 LearningRate 0.0055 Epoch: 15 Global Step: 190340 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:14:18,459-Speed 3294.89 samples/sec Loss 1.7014 LearningRate 0.0055 Epoch: 15 Global Step: 190350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:14:21,530-Speed 3335.70 samples/sec Loss 1.6839 LearningRate 0.0055 Epoch: 15 Global Step: 190360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:14:24,645-Speed 3288.70 samples/sec Loss 1.7389 LearningRate 0.0055 Epoch: 15 Global Step: 190370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:14:27,857-Speed 3188.21 samples/sec Loss 1.7037 LearningRate 0.0055 Epoch: 15 Global Step: 190380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:14:30,976-Speed 3285.05 samples/sec Loss 1.7929 LearningRate 0.0055 Epoch: 15 Global Step: 190390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:14:34,058-Speed 3322.63 samples/sec Loss 1.6975 LearningRate 0.0055 Epoch: 15 Global Step: 190400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:14:37,131-Speed 3333.55 samples/sec Loss 1.6793 LearningRate 0.0055 Epoch: 15 Global Step: 190410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:14:40,297-Speed 3235.16 samples/sec Loss 1.7350 LearningRate 0.0055 Epoch: 15 Global Step: 190420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:14:43,394-Speed 3307.33 samples/sec Loss 1.7066 LearningRate 0.0054 Epoch: 15 Global Step: 190430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:14:46,467-Speed 3333.65 samples/sec Loss 1.7219 LearningRate 0.0054 Epoch: 15 Global Step: 190440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:14:49,550-Speed 3322.84 samples/sec Loss 1.6874 LearningRate 0.0054 Epoch: 15 Global Step: 190450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:14:52,632-Speed 3324.13 samples/sec Loss 1.7393 LearningRate 0.0054 Epoch: 15 Global Step: 190460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:14:55,738-Speed 3297.20 samples/sec Loss 1.6884 LearningRate 0.0054 Epoch: 15 Global Step: 190470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:14:58,848-Speed 3293.92 samples/sec Loss 1.7236 LearningRate 0.0054 Epoch: 15 Global Step: 190480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:15:02,020-Speed 3229.82 samples/sec Loss 1.7127 LearningRate 0.0054 Epoch: 15 Global Step: 190490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:15:05,116-Speed 3307.87 samples/sec Loss 1.7233 LearningRate 0.0054 Epoch: 15 Global Step: 190500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:15:08,286-Speed 3230.95 samples/sec Loss 1.8085 LearningRate 0.0054 Epoch: 15 Global Step: 190510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:15:11,403-Speed 3286.69 samples/sec Loss 1.7494 LearningRate 0.0054 Epoch: 15 Global Step: 190520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:15:14,505-Speed 3301.90 samples/sec Loss 1.7127 LearningRate 0.0054 Epoch: 15 Global Step: 190530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:15:17,610-Speed 3299.06 samples/sec Loss 1.7244 LearningRate 0.0054 Epoch: 15 Global Step: 190540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:15:20,719-Speed 3294.21 samples/sec Loss 1.7105 LearningRate 0.0054 Epoch: 15 Global Step: 190550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:15:23,846-Speed 3276.48 samples/sec Loss 1.6731 LearningRate 0.0054 Epoch: 15 Global Step: 190560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:15:27,022-Speed 3224.74 samples/sec Loss 1.7228 LearningRate 0.0054 Epoch: 15 Global Step: 190570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:15:30,170-Speed 3253.68 samples/sec Loss 1.7161 LearningRate 0.0054 Epoch: 15 Global Step: 190580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:15:33,250-Speed 3326.26 samples/sec Loss 1.7267 LearningRate 0.0054 Epoch: 15 Global Step: 190590 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:15:36,394-Speed 3257.92 samples/sec Loss 1.6955 LearningRate 0.0054 Epoch: 15 Global Step: 190600 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:15:39,538-Speed 3257.46 samples/sec Loss 1.7108 LearningRate 0.0054 Epoch: 15 Global Step: 190610 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:15:42,610-Speed 3334.40 samples/sec Loss 1.7350 LearningRate 0.0054 Epoch: 15 Global Step: 190620 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:15:45,679-Speed 3338.37 samples/sec Loss 1.7021 LearningRate 0.0054 Epoch: 15 Global Step: 190630 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:15:48,784-Speed 3299.03 samples/sec Loss 1.7544 LearningRate 0.0054 Epoch: 15 Global Step: 190640 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:15:51,919-Speed 3266.23 samples/sec Loss 1.6749 LearningRate 0.0054 Epoch: 15 Global Step: 190650 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:15:55,030-Speed 3293.33 samples/sec Loss 1.6783 LearningRate 0.0054 Epoch: 15 Global Step: 190660 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:15:58,128-Speed 3306.13 samples/sec Loss 1.7374 LearningRate 0.0054 Epoch: 15 Global Step: 190670 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:16:01,206-Speed 3327.38 samples/sec Loss 1.7163 LearningRate 0.0054 Epoch: 15 Global Step: 190680 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:16:04,349-Speed 3259.76 samples/sec Loss 1.7480 LearningRate 0.0054 Epoch: 15 Global Step: 190690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:07,465-Speed 3287.09 samples/sec Loss 1.7286 LearningRate 0.0054 Epoch: 15 Global Step: 190700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:10,557-Speed 3312.01 samples/sec Loss 1.7071 LearningRate 0.0054 Epoch: 15 Global Step: 190710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:13,666-Speed 3295.03 samples/sec Loss 1.7296 LearningRate 0.0054 Epoch: 15 Global Step: 190720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:16,783-Speed 3286.83 samples/sec Loss 1.6667 LearningRate 0.0054 Epoch: 15 Global Step: 190730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:19,962-Speed 3221.76 samples/sec Loss 1.7332 LearningRate 0.0054 Epoch: 15 Global Step: 190740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:23,049-Speed 3318.43 samples/sec Loss 1.6834 LearningRate 0.0054 Epoch: 15 Global Step: 190750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:26,201-Speed 3249.48 samples/sec Loss 1.7386 LearningRate 0.0054 Epoch: 15 Global Step: 190760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:29,336-Speed 3267.69 samples/sec Loss 1.6610 LearningRate 0.0054 Epoch: 15 Global Step: 190770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:32,413-Speed 3328.84 samples/sec Loss 1.7814 LearningRate 0.0054 Epoch: 15 Global Step: 190780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:35,500-Speed 3318.33 samples/sec Loss 1.6914 LearningRate 0.0054 Epoch: 15 Global Step: 190790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:16:38,556-Speed 3351.81 samples/sec Loss 1.7374 LearningRate 0.0054 Epoch: 15 Global Step: 190800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:41,641-Speed 3320.12 samples/sec Loss 1.7252 LearningRate 0.0054 Epoch: 15 Global Step: 190810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:44,723-Speed 3323.61 samples/sec Loss 1.6732 LearningRate 0.0054 Epoch: 15 Global Step: 190820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:47,863-Speed 3262.21 samples/sec Loss 1.7303 LearningRate 0.0054 Epoch: 15 Global Step: 190830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:50,963-Speed 3303.23 samples/sec Loss 1.7361 LearningRate 0.0054 Epoch: 15 Global Step: 190840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:54,079-Speed 3287.99 samples/sec Loss 1.7629 LearningRate 0.0054 Epoch: 15 Global Step: 190850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:16:57,146-Speed 3339.69 samples/sec Loss 1.7041 LearningRate 0.0054 Epoch: 15 Global Step: 190860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:17:00,312-Speed 3236.05 samples/sec Loss 1.7381 LearningRate 0.0054 Epoch: 15 Global Step: 190870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:17:03,453-Speed 3260.63 samples/sec Loss 1.7717 LearningRate 0.0054 Epoch: 15 Global Step: 190880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:17:06,597-Speed 3258.34 samples/sec Loss 1.7548 LearningRate 0.0054 Epoch: 15 Global Step: 190890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:17:09,695-Speed 3305.85 samples/sec Loss 1.6868 LearningRate 0.0054 Epoch: 15 Global Step: 190900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:17:12,795-Speed 3304.92 samples/sec Loss 1.7014 LearningRate 0.0054 Epoch: 15 Global Step: 190910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:17:15,875-Speed 3325.00 samples/sec Loss 1.7283 LearningRate 0.0054 Epoch: 15 Global Step: 190920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:17:18,965-Speed 3315.38 samples/sec Loss 1.6995 LearningRate 0.0054 Epoch: 15 Global Step: 190930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:17:22,047-Speed 3323.91 samples/sec Loss 1.7417 LearningRate 0.0054 Epoch: 15 Global Step: 190940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:17:25,145-Speed 3305.99 samples/sec Loss 1.7872 LearningRate 0.0054 Epoch: 15 Global Step: 190950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:17:28,235-Speed 3315.22 samples/sec Loss 1.7445 LearningRate 0.0054 Epoch: 15 Global Step: 190960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:17:31,310-Speed 3331.52 samples/sec Loss 1.6964 LearningRate 0.0053 Epoch: 15 Global Step: 190970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:17:34,419-Speed 3294.31 samples/sec Loss 1.7254 LearningRate 0.0053 Epoch: 15 Global Step: 190980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:17:37,573-Speed 3248.33 samples/sec Loss 1.7573 LearningRate 0.0053 Epoch: 15 Global Step: 190990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:17:40,649-Speed 3330.16 samples/sec Loss 1.7487 LearningRate 0.0053 Epoch: 15 Global Step: 191000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:17:43,816-Speed 3234.18 samples/sec Loss 1.7425 LearningRate 0.0053 Epoch: 15 Global Step: 191010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:17:46,916-Speed 3304.39 samples/sec Loss 1.6669 LearningRate 0.0053 Epoch: 15 Global Step: 191020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:17:50,074-Speed 3243.39 samples/sec Loss 1.7800 LearningRate 0.0053 Epoch: 15 Global Step: 191030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:17:53,173-Speed 3305.42 samples/sec Loss 1.7173 LearningRate 0.0053 Epoch: 15 Global Step: 191040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:17:56,244-Speed 3335.41 samples/sec Loss 1.7321 LearningRate 0.0053 Epoch: 15 Global Step: 191050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:17:59,360-Speed 3286.59 samples/sec Loss 1.7094 LearningRate 0.0053 Epoch: 15 Global Step: 191060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:02,459-Speed 3306.03 samples/sec Loss 1.7323 LearningRate 0.0053 Epoch: 15 Global Step: 191070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:05,580-Speed 3282.27 samples/sec Loss 1.7515 LearningRate 0.0053 Epoch: 15 Global Step: 191080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:08,675-Speed 3308.68 samples/sec Loss 1.6880 LearningRate 0.0053 Epoch: 15 Global Step: 191090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:11,761-Speed 3319.71 samples/sec Loss 1.7552 LearningRate 0.0053 Epoch: 15 Global Step: 191100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:18:14,861-Speed 3303.89 samples/sec Loss 1.6835 LearningRate 0.0053 Epoch: 15 Global Step: 191110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:18:17,989-Speed 3275.18 samples/sec Loss 1.7769 LearningRate 0.0053 Epoch: 15 Global Step: 191120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:18:21,087-Speed 3305.77 samples/sec Loss 1.6891 LearningRate 0.0053 Epoch: 15 Global Step: 191130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:18:24,161-Speed 3332.68 samples/sec Loss 1.7435 LearningRate 0.0053 Epoch: 15 Global Step: 191140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:27,316-Speed 3246.39 samples/sec Loss 1.7364 LearningRate 0.0053 Epoch: 15 Global Step: 191150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:30,475-Speed 3242.53 samples/sec Loss 1.6831 LearningRate 0.0053 Epoch: 15 Global Step: 191160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:33,631-Speed 3246.14 samples/sec Loss 1.7361 LearningRate 0.0053 Epoch: 15 Global Step: 191170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:36,824-Speed 3207.70 samples/sec Loss 1.6928 LearningRate 0.0053 Epoch: 15 Global Step: 191180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:39,908-Speed 3321.69 samples/sec Loss 1.7197 LearningRate 0.0053 Epoch: 15 Global Step: 191190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:42,976-Speed 3338.51 samples/sec Loss 1.7593 LearningRate 0.0053 Epoch: 15 Global Step: 191200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:46,079-Speed 3301.53 samples/sec Loss 1.7339 LearningRate 0.0053 Epoch: 15 Global Step: 191210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:49,224-Speed 3257.03 samples/sec Loss 1.6920 LearningRate 0.0053 Epoch: 15 Global Step: 191220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:52,365-Speed 3260.61 samples/sec Loss 1.7599 LearningRate 0.0053 Epoch: 15 Global Step: 191230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:18:55,534-Speed 3232.45 samples/sec Loss 1.7058 LearningRate 0.0053 Epoch: 15 Global Step: 191240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:18:58,625-Speed 3313.95 samples/sec Loss 1.7736 LearningRate 0.0053 Epoch: 15 Global Step: 191250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:19:01,706-Speed 3324.32 samples/sec Loss 1.7243 LearningRate 0.0053 Epoch: 15 Global Step: 191260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:04,842-Speed 3266.65 samples/sec Loss 1.6898 LearningRate 0.0053 Epoch: 15 Global Step: 191270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:07,951-Speed 3294.75 samples/sec Loss 1.7296 LearningRate 0.0053 Epoch: 15 Global Step: 191280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:11,092-Speed 3261.77 samples/sec Loss 1.7460 LearningRate 0.0053 Epoch: 15 Global Step: 191290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:14,214-Speed 3279.78 samples/sec Loss 1.7221 LearningRate 0.0053 Epoch: 15 Global Step: 191300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:17,347-Speed 3270.37 samples/sec Loss 1.7275 LearningRate 0.0053 Epoch: 15 Global Step: 191310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:20,435-Speed 3316.01 samples/sec Loss 1.7359 LearningRate 0.0053 Epoch: 15 Global Step: 191320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:23,542-Speed 3297.61 samples/sec Loss 1.6926 LearningRate 0.0053 Epoch: 15 Global Step: 191330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:26,676-Speed 3268.59 samples/sec Loss 1.6986 LearningRate 0.0053 Epoch: 15 Global Step: 191340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:29,755-Speed 3326.52 samples/sec Loss 1.7239 LearningRate 0.0053 Epoch: 15 Global Step: 191350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:32,812-Speed 3350.70 samples/sec Loss 1.7400 LearningRate 0.0053 Epoch: 15 Global Step: 191360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:19:35,956-Speed 3257.55 samples/sec Loss 1.8100 LearningRate 0.0053 Epoch: 15 Global Step: 191370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:19:39,077-Speed 3282.47 samples/sec Loss 1.7227 LearningRate 0.0053 Epoch: 15 Global Step: 191380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:19:42,185-Speed 3295.46 samples/sec Loss 1.6327 LearningRate 0.0053 Epoch: 15 Global Step: 191390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:45,250-Speed 3342.81 samples/sec Loss 1.7325 LearningRate 0.0053 Epoch: 15 Global Step: 191400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:48,356-Speed 3297.03 samples/sec Loss 1.7224 LearningRate 0.0053 Epoch: 15 Global Step: 191410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:51,432-Speed 3330.54 samples/sec Loss 1.7511 LearningRate 0.0053 Epoch: 15 Global Step: 191420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:54,508-Speed 3329.95 samples/sec Loss 1.7472 LearningRate 0.0053 Epoch: 15 Global Step: 191430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:19:57,564-Speed 3352.10 samples/sec Loss 1.7070 LearningRate 0.0053 Epoch: 15 Global Step: 191440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:00,663-Speed 3305.27 samples/sec Loss 1.7394 LearningRate 0.0053 Epoch: 15 Global Step: 191450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:03,771-Speed 3296.22 samples/sec Loss 1.7309 LearningRate 0.0053 Epoch: 15 Global Step: 191460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:06,882-Speed 3292.61 samples/sec Loss 1.7055 LearningRate 0.0053 Epoch: 15 Global Step: 191470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:09,925-Speed 3366.28 samples/sec Loss 1.7109 LearningRate 0.0053 Epoch: 15 Global Step: 191480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:12,998-Speed 3333.20 samples/sec Loss 1.7744 LearningRate 0.0053 Epoch: 15 Global Step: 191490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:20:16,065-Speed 3339.19 samples/sec Loss 1.7287 LearningRate 0.0052 Epoch: 15 Global Step: 191500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:19,252-Speed 3213.99 samples/sec Loss 1.7069 LearningRate 0.0052 Epoch: 15 Global Step: 191510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:22,331-Speed 3327.59 samples/sec Loss 1.7411 LearningRate 0.0052 Epoch: 15 Global Step: 191520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:25,415-Speed 3321.09 samples/sec Loss 1.7057 LearningRate 0.0052 Epoch: 15 Global Step: 191530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:28,491-Speed 3329.21 samples/sec Loss 1.7703 LearningRate 0.0052 Epoch: 15 Global Step: 191540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:31,593-Speed 3302.11 samples/sec Loss 1.7812 LearningRate 0.0052 Epoch: 15 Global Step: 191550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:34,689-Speed 3309.03 samples/sec Loss 1.7147 LearningRate 0.0052 Epoch: 15 Global Step: 191560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:37,810-Speed 3281.56 samples/sec Loss 1.7245 LearningRate 0.0052 Epoch: 15 Global Step: 191570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:40,942-Speed 3270.63 samples/sec Loss 1.7353 LearningRate 0.0052 Epoch: 15 Global Step: 191580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:44,043-Speed 3303.11 samples/sec Loss 1.7036 LearningRate 0.0052 Epoch: 15 Global Step: 191590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:47,115-Speed 3335.10 samples/sec Loss 1.7406 LearningRate 0.0052 Epoch: 15 Global Step: 191600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:50,290-Speed 3226.00 samples/sec Loss 1.7474 LearningRate 0.0052 Epoch: 15 Global Step: 191610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:53,516-Speed 3175.39 samples/sec Loss 1.7109 LearningRate 0.0052 Epoch: 15 Global Step: 191620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:56,630-Speed 3288.64 samples/sec Loss 1.7700 LearningRate 0.0052 Epoch: 15 Global Step: 191630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:20:59,722-Speed 3312.42 samples/sec Loss 1.6502 LearningRate 0.0052 Epoch: 15 Global Step: 191640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:21:02,929-Speed 3193.83 samples/sec Loss 1.7449 LearningRate 0.0052 Epoch: 15 Global Step: 191650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:21:06,073-Speed 3258.74 samples/sec Loss 1.7074 LearningRate 0.0052 Epoch: 15 Global Step: 191660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:21:09,140-Speed 3339.54 samples/sec Loss 1.7175 LearningRate 0.0052 Epoch: 15 Global Step: 191670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:21:12,248-Speed 3295.87 samples/sec Loss 1.7834 LearningRate 0.0052 Epoch: 15 Global Step: 191680 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:21:15,399-Speed 3250.27 samples/sec Loss 1.7455 LearningRate 0.0052 Epoch: 15 Global Step: 191690 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:21:18,490-Speed 3314.06 samples/sec Loss 1.7478 LearningRate 0.0052 Epoch: 15 Global Step: 191700 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:21:21,556-Speed 3341.01 samples/sec Loss 1.7767 LearningRate 0.0052 Epoch: 15 Global Step: 191710 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:21:24,676-Speed 3283.33 samples/sec Loss 1.7264 LearningRate 0.0052 Epoch: 15 Global Step: 191720 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:21:27,759-Speed 3322.25 samples/sec Loss 1.7448 LearningRate 0.0052 Epoch: 15 Global Step: 191730 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:21:30,917-Speed 3243.46 samples/sec Loss 1.7509 LearningRate 0.0052 Epoch: 15 Global Step: 191740 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:21:33,984-Speed 3339.90 samples/sec Loss 1.7507 LearningRate 0.0052 Epoch: 15 Global Step: 191750 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:21:37,106-Speed 3280.51 samples/sec Loss 1.7146 LearningRate 0.0052 Epoch: 15 Global Step: 191760 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:21:40,196-Speed 3315.62 samples/sec Loss 1.7190 LearningRate 0.0052 Epoch: 15 Global Step: 191770 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:21:43,285-Speed 3315.94 samples/sec Loss 1.7521 LearningRate 0.0052 Epoch: 15 Global Step: 191780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:21:46,391-Speed 3298.31 samples/sec Loss 1.7550 LearningRate 0.0052 Epoch: 15 Global Step: 191790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:21:49,486-Speed 3309.48 samples/sec Loss 1.6939 LearningRate 0.0052 Epoch: 15 Global Step: 191800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:21:52,600-Speed 3288.99 samples/sec Loss 1.7532 LearningRate 0.0052 Epoch: 15 Global Step: 191810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:21:55,687-Speed 3317.68 samples/sec Loss 1.7016 LearningRate 0.0052 Epoch: 15 Global Step: 191820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:21:58,790-Speed 3301.28 samples/sec Loss 1.7196 LearningRate 0.0052 Epoch: 15 Global Step: 191830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:22:01,892-Speed 3302.81 samples/sec Loss 1.7322 LearningRate 0.0052 Epoch: 15 Global Step: 191840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:22:05,067-Speed 3225.84 samples/sec Loss 1.7314 LearningRate 0.0052 Epoch: 15 Global Step: 191850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:22:08,201-Speed 3268.12 samples/sec Loss 1.7471 LearningRate 0.0052 Epoch: 15 Global Step: 191860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:22:11,308-Speed 3297.60 samples/sec Loss 1.6903 LearningRate 0.0052 Epoch: 15 Global Step: 191870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:22:14,426-Speed 3285.23 samples/sec Loss 1.7653 LearningRate 0.0052 Epoch: 15 Global Step: 191880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:22:17,490-Speed 3342.99 samples/sec Loss 1.7258 LearningRate 0.0052 Epoch: 15 Global Step: 191890 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:22:20,567-Speed 3328.88 samples/sec Loss 1.7788 LearningRate 0.0052 Epoch: 15 Global Step: 191900 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:22:23,723-Speed 3246.34 samples/sec Loss 1.7689 LearningRate 0.0052 Epoch: 15 Global Step: 191910 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:22:26,818-Speed 3309.28 samples/sec Loss 1.7208 LearningRate 0.0052 Epoch: 15 Global Step: 191920 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:22:29,885-Speed 3339.45 samples/sec Loss 1.7122 LearningRate 0.0052 Epoch: 15 Global Step: 191930 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:22:33,013-Speed 3275.12 samples/sec Loss 1.7016 LearningRate 0.0052 Epoch: 15 Global Step: 191940 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:22:36,112-Speed 3305.40 samples/sec Loss 1.7384 LearningRate 0.0052 Epoch: 15 Global Step: 191950 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:22:39,247-Speed 3267.30 samples/sec Loss 1.7197 LearningRate 0.0052 Epoch: 15 Global Step: 191960 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:22:42,429-Speed 3218.50 samples/sec Loss 1.6851 LearningRate 0.0052 Epoch: 15 Global Step: 191970 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:22:45,534-Speed 3299.26 samples/sec Loss 1.7017 LearningRate 0.0052 Epoch: 15 Global Step: 191980 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:22:48,620-Speed 3318.87 samples/sec Loss 1.6754 LearningRate 0.0052 Epoch: 15 Global Step: 191990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:22:51,889-Speed 3133.97 samples/sec Loss 1.7359 LearningRate 0.0052 Epoch: 15 Global Step: 192000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:22:55,041-Speed 3249.25 samples/sec Loss 1.7066 LearningRate 0.0052 Epoch: 15 Global Step: 192010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:22:58,157-Speed 3287.80 samples/sec Loss 1.6789 LearningRate 0.0052 Epoch: 15 Global Step: 192020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:01,247-Speed 3315.07 samples/sec Loss 1.7878 LearningRate 0.0052 Epoch: 15 Global Step: 192030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:04,416-Speed 3231.80 samples/sec Loss 1.6584 LearningRate 0.0052 Epoch: 15 Global Step: 192040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:07,494-Speed 3327.95 samples/sec Loss 1.7122 LearningRate 0.0051 Epoch: 15 Global Step: 192050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:10,562-Speed 3338.34 samples/sec Loss 1.7493 LearningRate 0.0051 Epoch: 15 Global Step: 192060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:13,714-Speed 3250.86 samples/sec Loss 1.7341 LearningRate 0.0051 Epoch: 15 Global Step: 192070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:16,881-Speed 3234.43 samples/sec Loss 1.7342 LearningRate 0.0051 Epoch: 15 Global Step: 192080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:20,006-Speed 3277.44 samples/sec Loss 1.7382 LearningRate 0.0051 Epoch: 15 Global Step: 192090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:23,115-Speed 3294.61 samples/sec Loss 1.7739 LearningRate 0.0051 Epoch: 15 Global Step: 192100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:26,223-Speed 3296.01 samples/sec Loss 1.7259 LearningRate 0.0051 Epoch: 15 Global Step: 192110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:29,381-Speed 3244.08 samples/sec Loss 1.7655 LearningRate 0.0051 Epoch: 15 Global Step: 192120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:32,458-Speed 3328.23 samples/sec Loss 1.7392 LearningRate 0.0051 Epoch: 15 Global Step: 192130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:35,586-Speed 3274.75 samples/sec Loss 1.7521 LearningRate 0.0051 Epoch: 15 Global Step: 192140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:38,688-Speed 3302.02 samples/sec Loss 1.7586 LearningRate 0.0051 Epoch: 15 Global Step: 192150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:41,860-Speed 3229.63 samples/sec Loss 1.7113 LearningRate 0.0051 Epoch: 15 Global Step: 192160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:44,963-Speed 3300.42 samples/sec Loss 1.7303 LearningRate 0.0051 Epoch: 15 Global Step: 192170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:48,070-Speed 3297.19 samples/sec Loss 1.7851 LearningRate 0.0051 Epoch: 15 Global Step: 192180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:23:51,221-Speed 3250.53 samples/sec Loss 1.7268 LearningRate 0.0051 Epoch: 15 Global Step: 192190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:23:54,359-Speed 3264.15 samples/sec Loss 1.7532 LearningRate 0.0051 Epoch: 15 Global Step: 192200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:23:57,472-Speed 3290.79 samples/sec Loss 1.7134 LearningRate 0.0051 Epoch: 15 Global Step: 192210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:24:00,625-Speed 3247.98 samples/sec Loss 1.7336 LearningRate 0.0051 Epoch: 15 Global Step: 192220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:24:03,727-Speed 3302.27 samples/sec Loss 1.7214 LearningRate 0.0051 Epoch: 15 Global Step: 192230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:06,820-Speed 3311.99 samples/sec Loss 1.7839 LearningRate 0.0051 Epoch: 15 Global Step: 192240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:09,888-Speed 3338.61 samples/sec Loss 1.7108 LearningRate 0.0051 Epoch: 15 Global Step: 192250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:13,023-Speed 3266.32 samples/sec Loss 1.7501 LearningRate 0.0051 Epoch: 15 Global Step: 192260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:16,176-Speed 3249.77 samples/sec Loss 1.7826 LearningRate 0.0051 Epoch: 15 Global Step: 192270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:19,262-Speed 3318.45 samples/sec Loss 1.7423 LearningRate 0.0051 Epoch: 15 Global Step: 192280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:22,344-Speed 3323.71 samples/sec Loss 1.7876 LearningRate 0.0051 Epoch: 15 Global Step: 192290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:25,541-Speed 3204.55 samples/sec Loss 1.7584 LearningRate 0.0051 Epoch: 15 Global Step: 192300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:28,645-Speed 3299.79 samples/sec Loss 1.7888 LearningRate 0.0051 Epoch: 15 Global Step: 192310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:31,764-Speed 3284.22 samples/sec Loss 1.7162 LearningRate 0.0051 Epoch: 15 Global Step: 192320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:34,873-Speed 3294.55 samples/sec Loss 1.7366 LearningRate 0.0051 Epoch: 15 Global Step: 192330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:24:37,946-Speed 3333.82 samples/sec Loss 1.7328 LearningRate 0.0051 Epoch: 15 Global Step: 192340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:24:41,121-Speed 3225.68 samples/sec Loss 1.7416 LearningRate 0.0051 Epoch: 15 Global Step: 192350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:24:44,231-Speed 3293.44 samples/sec Loss 1.7303 LearningRate 0.0051 Epoch: 15 Global Step: 192360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:24:47,344-Speed 3290.60 samples/sec Loss 1.7520 LearningRate 0.0051 Epoch: 15 Global Step: 192370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:50,460-Speed 3286.94 samples/sec Loss 1.7431 LearningRate 0.0051 Epoch: 15 Global Step: 192380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:53,541-Speed 3324.66 samples/sec Loss 1.7313 LearningRate 0.0051 Epoch: 15 Global Step: 192390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:56,620-Speed 3327.51 samples/sec Loss 1.7527 LearningRate 0.0051 Epoch: 15 Global Step: 192400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:24:59,751-Speed 3271.47 samples/sec Loss 1.7775 LearningRate 0.0051 Epoch: 15 Global Step: 192410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:02,829-Speed 3327.42 samples/sec Loss 1.7484 LearningRate 0.0051 Epoch: 15 Global Step: 192420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:05,930-Speed 3303.80 samples/sec Loss 1.7241 LearningRate 0.0051 Epoch: 15 Global Step: 192430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:09,008-Speed 3328.06 samples/sec Loss 1.7813 LearningRate 0.0051 Epoch: 15 Global Step: 192440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:12,150-Speed 3259.23 samples/sec Loss 1.7487 LearningRate 0.0051 Epoch: 15 Global Step: 192450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:15,247-Speed 3307.34 samples/sec Loss 1.7238 LearningRate 0.0051 Epoch: 15 Global Step: 192460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:18,350-Speed 3300.81 samples/sec Loss 1.7563 LearningRate 0.0051 Epoch: 15 Global Step: 192470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:25:21,412-Speed 3345.43 samples/sec Loss 1.7049 LearningRate 0.0051 Epoch: 15 Global Step: 192480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:25:24,499-Speed 3318.69 samples/sec Loss 1.8137 LearningRate 0.0051 Epoch: 15 Global Step: 192490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:27,580-Speed 3325.10 samples/sec Loss 1.7410 LearningRate 0.0051 Epoch: 15 Global Step: 192500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:30,658-Speed 3327.63 samples/sec Loss 1.6946 LearningRate 0.0051 Epoch: 15 Global Step: 192510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:33,719-Speed 3346.26 samples/sec Loss 1.7528 LearningRate 0.0051 Epoch: 15 Global Step: 192520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:36,792-Speed 3332.64 samples/sec Loss 1.6959 LearningRate 0.0051 Epoch: 15 Global Step: 192530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:39,915-Speed 3280.75 samples/sec Loss 1.6994 LearningRate 0.0051 Epoch: 15 Global Step: 192540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:43,032-Speed 3285.83 samples/sec Loss 1.7174 LearningRate 0.0051 Epoch: 15 Global Step: 192550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:46,170-Speed 3264.51 samples/sec Loss 1.7479 LearningRate 0.0051 Epoch: 15 Global Step: 192560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:49,282-Speed 3290.98 samples/sec Loss 1.6915 LearningRate 0.0051 Epoch: 15 Global Step: 192570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:52,410-Speed 3275.00 samples/sec Loss 1.7222 LearningRate 0.0051 Epoch: 15 Global Step: 192580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:55,491-Speed 3324.27 samples/sec Loss 1.7488 LearningRate 0.0051 Epoch: 15 Global Step: 192590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:25:58,540-Speed 3359.90 samples/sec Loss 1.7841 LearningRate 0.0050 Epoch: 15 Global Step: 192600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:01,597-Speed 3350.50 samples/sec Loss 1.7391 LearningRate 0.0050 Epoch: 15 Global Step: 192610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:04,692-Speed 3309.98 samples/sec Loss 1.7808 LearningRate 0.0050 Epoch: 15 Global Step: 192620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:07,796-Speed 3299.82 samples/sec Loss 1.7277 LearningRate 0.0050 Epoch: 15 Global Step: 192630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:10,907-Speed 3292.03 samples/sec Loss 1.7350 LearningRate 0.0050 Epoch: 15 Global Step: 192640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:13,991-Speed 3321.25 samples/sec Loss 1.7257 LearningRate 0.0050 Epoch: 15 Global Step: 192650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:17,135-Speed 3258.60 samples/sec Loss 1.7219 LearningRate 0.0050 Epoch: 15 Global Step: 192660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:20,197-Speed 3344.88 samples/sec Loss 1.7177 LearningRate 0.0050 Epoch: 15 Global Step: 192670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:23,307-Speed 3293.70 samples/sec Loss 1.6963 LearningRate 0.0050 Epoch: 15 Global Step: 192680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:26,470-Speed 3239.10 samples/sec Loss 1.7617 LearningRate 0.0050 Epoch: 15 Global Step: 192690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:26:29,632-Speed 3238.89 samples/sec Loss 1.7637 LearningRate 0.0050 Epoch: 15 Global Step: 192700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:26:32,758-Speed 3277.17 samples/sec Loss 1.7364 LearningRate 0.0050 Epoch: 15 Global Step: 192710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:26:35,873-Speed 3288.26 samples/sec Loss 1.7324 LearningRate 0.0050 Epoch: 15 Global Step: 192720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:26:38,952-Speed 3326.59 samples/sec Loss 1.7189 LearningRate 0.0050 Epoch: 15 Global Step: 192730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:42,045-Speed 3311.25 samples/sec Loss 1.7824 LearningRate 0.0050 Epoch: 15 Global Step: 192740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:45,185-Speed 3263.25 samples/sec Loss 1.7489 LearningRate 0.0050 Epoch: 15 Global Step: 192750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:48,346-Speed 3239.53 samples/sec Loss 1.7476 LearningRate 0.0050 Epoch: 15 Global Step: 192760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:51,450-Speed 3300.46 samples/sec Loss 1.7441 LearningRate 0.0050 Epoch: 15 Global Step: 192770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:54,591-Speed 3261.45 samples/sec Loss 1.7709 LearningRate 0.0050 Epoch: 15 Global Step: 192780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:26:57,653-Speed 3345.21 samples/sec Loss 1.7883 LearningRate 0.0050 Epoch: 15 Global Step: 192790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:27:00,748-Speed 3309.60 samples/sec Loss 1.6746 LearningRate 0.0050 Epoch: 15 Global Step: 192800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:27:03,844-Speed 3308.46 samples/sec Loss 1.7404 LearningRate 0.0050 Epoch: 15 Global Step: 192810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:27:06,911-Speed 3338.91 samples/sec Loss 1.7183 LearningRate 0.0050 Epoch: 15 Global Step: 192820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:27:10,028-Speed 3286.63 samples/sec Loss 1.7136 LearningRate 0.0050 Epoch: 15 Global Step: 192830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:27:13,143-Speed 3289.26 samples/sec Loss 1.6966 LearningRate 0.0050 Epoch: 15 Global Step: 192840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:27:16,307-Speed 3236.69 samples/sec Loss 1.7246 LearningRate 0.0050 Epoch: 15 Global Step: 192850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:27:19,434-Speed 3275.76 samples/sec Loss 1.7664 LearningRate 0.0050 Epoch: 15 Global Step: 192860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:27:22,494-Speed 3347.77 samples/sec Loss 1.7589 LearningRate 0.0050 Epoch: 15 Global Step: 192870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:27:25,577-Speed 3322.12 samples/sec Loss 1.7852 LearningRate 0.0050 Epoch: 15 Global Step: 192880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:27:28,686-Speed 3295.07 samples/sec Loss 1.7228 LearningRate 0.0050 Epoch: 15 Global Step: 192890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:27:31,831-Speed 3257.17 samples/sec Loss 1.7692 LearningRate 0.0050 Epoch: 15 Global Step: 192900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:27:34,903-Speed 3334.48 samples/sec Loss 1.7303 LearningRate 0.0050 Epoch: 15 Global Step: 192910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:27:38,049-Speed 3256.09 samples/sec Loss 1.8089 LearningRate 0.0050 Epoch: 15 Global Step: 192920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:27:41,193-Speed 3258.26 samples/sec Loss 1.7778 LearningRate 0.0050 Epoch: 15 Global Step: 192930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:27:44,269-Speed 3329.71 samples/sec Loss 1.7810 LearningRate 0.0050 Epoch: 15 Global Step: 192940 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:27:47,353-Speed 3320.77 samples/sec Loss 1.7915 LearningRate 0.0050 Epoch: 15 Global Step: 192950 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:27:50,464-Speed 3292.88 samples/sec Loss 1.7492 LearningRate 0.0050 Epoch: 15 Global Step: 192960 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:27:53,558-Speed 3311.54 samples/sec Loss 1.7445 LearningRate 0.0050 Epoch: 15 Global Step: 192970 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:27:56,658-Speed 3303.07 samples/sec Loss 1.7508 LearningRate 0.0050 Epoch: 15 Global Step: 192980 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:27:59,739-Speed 3325.70 samples/sec Loss 1.7739 LearningRate 0.0050 Epoch: 15 Global Step: 192990 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:28:02,953-Speed 3186.29 samples/sec Loss 1.7243 LearningRate 0.0050 Epoch: 15 Global Step: 193000 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:28:06,167-Speed 3188.22 samples/sec Loss 1.7044 LearningRate 0.0050 Epoch: 15 Global Step: 193010 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:28:09,270-Speed 3300.34 samples/sec Loss 1.7663 LearningRate 0.0050 Epoch: 15 Global Step: 193020 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:28:12,389-Speed 3283.87 samples/sec Loss 1.7077 LearningRate 0.0050 Epoch: 15 Global Step: 193030 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:28:15,565-Speed 3225.54 samples/sec Loss 1.7627 LearningRate 0.0050 Epoch: 15 Global Step: 193040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:28:18,747-Speed 3218.91 samples/sec Loss 1.7819 LearningRate 0.0050 Epoch: 15 Global Step: 193050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:28:21,811-Speed 3342.90 samples/sec Loss 1.7238 LearningRate 0.0050 Epoch: 15 Global Step: 193060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:28:24,956-Speed 3256.64 samples/sec Loss 1.7482 LearningRate 0.0050 Epoch: 15 Global Step: 193070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:28:28,037-Speed 3325.26 samples/sec Loss 1.7231 LearningRate 0.0050 Epoch: 15 Global Step: 193080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:28:31,205-Speed 3232.76 samples/sec Loss 1.7553 LearningRate 0.0050 Epoch: 15 Global Step: 193090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:28:34,311-Speed 3297.67 samples/sec Loss 1.7181 LearningRate 0.0050 Epoch: 15 Global Step: 193100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:28:37,399-Speed 3317.50 samples/sec Loss 1.7775 LearningRate 0.0050 Epoch: 15 Global Step: 193110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:28:40,494-Speed 3310.15 samples/sec Loss 1.7616 LearningRate 0.0050 Epoch: 15 Global Step: 193120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:28:43,654-Speed 3240.98 samples/sec Loss 1.7363 LearningRate 0.0050 Epoch: 15 Global Step: 193130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:28:46,715-Speed 3346.50 samples/sec Loss 1.7221 LearningRate 0.0050 Epoch: 15 Global Step: 193140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:28:49,862-Speed 3255.14 samples/sec Loss 1.7645 LearningRate 0.0050 Epoch: 15 Global Step: 193150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:28:53,008-Speed 3255.91 samples/sec Loss 1.7670 LearningRate 0.0049 Epoch: 15 Global Step: 193160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:28:56,120-Speed 3291.90 samples/sec Loss 1.7899 LearningRate 0.0049 Epoch: 15 Global Step: 193170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:28:59,193-Speed 3332.67 samples/sec Loss 1.7182 LearningRate 0.0049 Epoch: 15 Global Step: 193180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:29:02,345-Speed 3249.87 samples/sec Loss 1.7275 LearningRate 0.0049 Epoch: 15 Global Step: 193190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:29:05,452-Speed 3296.94 samples/sec Loss 1.8009 LearningRate 0.0049 Epoch: 15 Global Step: 193200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:29:08,593-Speed 3260.86 samples/sec Loss 1.8037 LearningRate 0.0049 Epoch: 15 Global Step: 193210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:29:11,727-Speed 3268.23 samples/sec Loss 1.8027 LearningRate 0.0049 Epoch: 15 Global Step: 193220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:29:14,832-Speed 3299.32 samples/sec Loss 1.7305 LearningRate 0.0049 Epoch: 15 Global Step: 193230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:29:17,918-Speed 3318.82 samples/sec Loss 1.7641 LearningRate 0.0049 Epoch: 15 Global Step: 193240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:29:21,048-Speed 3272.65 samples/sec Loss 1.7805 LearningRate 0.0049 Epoch: 15 Global Step: 193250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:29:24,123-Speed 3331.51 samples/sec Loss 1.7192 LearningRate 0.0049 Epoch: 15 Global Step: 193260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:29:27,214-Speed 3313.11 samples/sec Loss 1.7786 LearningRate 0.0049 Epoch: 15 Global Step: 193270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:29:30,296-Speed 3324.12 samples/sec Loss 1.7715 LearningRate 0.0049 Epoch: 15 Global Step: 193280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:29:33,360-Speed 3343.31 samples/sec Loss 1.7387 LearningRate 0.0049 Epoch: 15 Global Step: 193290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:29:36,431-Speed 3335.02 samples/sec Loss 1.8132 LearningRate 0.0049 Epoch: 15 Global Step: 193300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:29:39,461-Speed 3380.61 samples/sec Loss 1.7263 LearningRate 0.0049 Epoch: 15 Global Step: 193310 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:29:42,531-Speed 3336.86 samples/sec Loss 1.7396 LearningRate 0.0049 Epoch: 15 Global Step: 193320 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:29:45,619-Speed 3317.43 samples/sec Loss 1.8188 LearningRate 0.0049 Epoch: 15 Global Step: 193330 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:29:48,761-Speed 3259.65 samples/sec Loss 1.7599 LearningRate 0.0049 Epoch: 15 Global Step: 193340 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:29:51,894-Speed 3269.63 samples/sec Loss 1.7521 LearningRate 0.0049 Epoch: 15 Global Step: 193350 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:29:55,024-Speed 3271.93 samples/sec Loss 1.7479 LearningRate 0.0049 Epoch: 15 Global Step: 193360 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:29:58,142-Speed 3285.18 samples/sec Loss 1.7591 LearningRate 0.0049 Epoch: 15 Global Step: 193370 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:30:01,253-Speed 3293.08 samples/sec Loss 1.7387 LearningRate 0.0049 Epoch: 15 Global Step: 193380 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:30:04,355-Speed 3302.27 samples/sec Loss 1.7762 LearningRate 0.0049 Epoch: 15 Global Step: 193390 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:30:07,439-Speed 3321.16 samples/sec Loss 1.7479 LearningRate 0.0049 Epoch: 15 Global Step: 193400 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:30:10,508-Speed 3337.34 samples/sec Loss 1.7335 LearningRate 0.0049 Epoch: 15 Global Step: 193410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:30:13,636-Speed 3275.00 samples/sec Loss 1.7632 LearningRate 0.0049 Epoch: 15 Global Step: 193420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:30:16,792-Speed 3245.84 samples/sec Loss 1.7938 LearningRate 0.0049 Epoch: 15 Global Step: 193430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:30:19,929-Speed 3267.62 samples/sec Loss 1.7457 LearningRate 0.0049 Epoch: 15 Global Step: 193440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:30:23,057-Speed 3275.22 samples/sec Loss 1.6693 LearningRate 0.0049 Epoch: 15 Global Step: 193450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:30:26,140-Speed 3322.75 samples/sec Loss 1.7617 LearningRate 0.0049 Epoch: 15 Global Step: 193460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:30:29,308-Speed 3232.74 samples/sec Loss 1.7650 LearningRate 0.0049 Epoch: 15 Global Step: 193470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:30:32,423-Speed 3288.60 samples/sec Loss 1.7792 LearningRate 0.0049 Epoch: 15 Global Step: 193480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:30:35,535-Speed 3291.79 samples/sec Loss 1.7915 LearningRate 0.0049 Epoch: 15 Global Step: 193490 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:30:38,632-Speed 3307.01 samples/sec Loss 1.7487 LearningRate 0.0049 Epoch: 15 Global Step: 193500 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:30:41,736-Speed 3299.87 samples/sec Loss 1.7708 LearningRate 0.0049 Epoch: 15 Global Step: 193510 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:30:44,897-Speed 3240.93 samples/sec Loss 1.7782 LearningRate 0.0049 Epoch: 15 Global Step: 193520 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:30:48,050-Speed 3248.95 samples/sec Loss 1.7279 LearningRate 0.0049 Epoch: 15 Global Step: 193530 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:30:51,177-Speed 3275.59 samples/sec Loss 1.7721 LearningRate 0.0049 Epoch: 15 Global Step: 193540 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:30:54,330-Speed 3248.63 samples/sec Loss 1.7829 LearningRate 0.0049 Epoch: 15 Global Step: 193550 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:30:57,486-Speed 3245.14 samples/sec Loss 1.8050 LearningRate 0.0049 Epoch: 15 Global Step: 193560 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:31:00,650-Speed 3237.63 samples/sec Loss 1.7620 LearningRate 0.0049 Epoch: 15 Global Step: 193570 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:31:03,824-Speed 3227.93 samples/sec Loss 1.7883 LearningRate 0.0049 Epoch: 15 Global Step: 193580 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:31:06,997-Speed 3228.08 samples/sec Loss 1.7186 LearningRate 0.0049 Epoch: 15 Global Step: 193590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:10,042-Speed 3362.84 samples/sec Loss 1.7673 LearningRate 0.0049 Epoch: 15 Global Step: 193600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:13,186-Speed 3259.11 samples/sec Loss 1.7447 LearningRate 0.0049 Epoch: 15 Global Step: 193610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:16,352-Speed 3235.48 samples/sec Loss 1.7403 LearningRate 0.0049 Epoch: 15 Global Step: 193620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:19,472-Speed 3282.77 samples/sec Loss 1.7858 LearningRate 0.0049 Epoch: 15 Global Step: 193630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:22,580-Speed 3295.44 samples/sec Loss 1.8242 LearningRate 0.0049 Epoch: 15 Global Step: 193640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:25,824-Speed 3157.38 samples/sec Loss 1.7538 LearningRate 0.0049 Epoch: 15 Global Step: 193650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:28,910-Speed 3319.39 samples/sec Loss 1.7659 LearningRate 0.0049 Epoch: 15 Global Step: 193660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:31,998-Speed 3316.90 samples/sec Loss 1.7365 LearningRate 0.0049 Epoch: 15 Global Step: 193670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:35,056-Speed 3349.91 samples/sec Loss 1.6873 LearningRate 0.0049 Epoch: 15 Global Step: 193680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:38,129-Speed 3333.81 samples/sec Loss 1.7877 LearningRate 0.0049 Epoch: 15 Global Step: 193690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:31:41,232-Speed 3300.38 samples/sec Loss 1.7336 LearningRate 0.0049 Epoch: 15 Global Step: 193700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:31:44,318-Speed 3319.47 samples/sec Loss 1.7669 LearningRate 0.0049 Epoch: 15 Global Step: 193710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:47,390-Speed 3334.24 samples/sec Loss 1.7508 LearningRate 0.0048 Epoch: 15 Global Step: 193720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:50,585-Speed 3205.95 samples/sec Loss 1.7524 LearningRate 0.0048 Epoch: 15 Global Step: 193730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:53,848-Speed 3139.90 samples/sec Loss 1.7582 LearningRate 0.0048 Epoch: 15 Global Step: 193740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:31:56,965-Speed 3285.68 samples/sec Loss 1.6667 LearningRate 0.0048 Epoch: 15 Global Step: 193750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:32:00,031-Speed 3341.05 samples/sec Loss 1.7512 LearningRate 0.0048 Epoch: 15 Global Step: 193760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:32:03,101-Speed 3336.92 samples/sec Loss 1.7544 LearningRate 0.0048 Epoch: 15 Global Step: 193770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:32:06,164-Speed 3343.90 samples/sec Loss 1.7063 LearningRate 0.0048 Epoch: 15 Global Step: 193780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:32:09,216-Speed 3356.13 samples/sec Loss 1.7242 LearningRate 0.0048 Epoch: 15 Global Step: 193790 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:32:12,266-Speed 3358.34 samples/sec Loss 1.6917 LearningRate 0.0048 Epoch: 15 Global Step: 193800 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:32:15,394-Speed 3275.28 samples/sec Loss 1.7885 LearningRate 0.0048 Epoch: 15 Global Step: 193810 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:32:18,550-Speed 3245.51 samples/sec Loss 1.7903 LearningRate 0.0048 Epoch: 15 Global Step: 193820 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:32:21,640-Speed 3314.94 samples/sec Loss 1.7003 LearningRate 0.0048 Epoch: 15 Global Step: 193830 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:32:24,742-Speed 3302.25 samples/sec Loss 1.8082 LearningRate 0.0048 Epoch: 15 Global Step: 193840 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:32:27,891-Speed 3252.40 samples/sec Loss 1.7638 LearningRate 0.0048 Epoch: 15 Global Step: 193850 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:32:30,964-Speed 3334.02 samples/sec Loss 1.7551 LearningRate 0.0048 Epoch: 15 Global Step: 193860 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:32:34,056-Speed 3312.76 samples/sec Loss 1.7017 LearningRate 0.0048 Epoch: 15 Global Step: 193870 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:32:37,155-Speed 3304.88 samples/sec Loss 1.7782 LearningRate 0.0048 Epoch: 15 Global Step: 193880 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:32:40,293-Speed 3264.87 samples/sec Loss 1.7384 LearningRate 0.0048 Epoch: 15 Global Step: 193890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:32:43,439-Speed 3255.59 samples/sec Loss 1.7360 LearningRate 0.0048 Epoch: 15 Global Step: 193900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:32:46,565-Speed 3277.49 samples/sec Loss 1.7183 LearningRate 0.0048 Epoch: 15 Global Step: 193910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:32:49,733-Speed 3232.45 samples/sec Loss 1.7731 LearningRate 0.0048 Epoch: 15 Global Step: 193920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:32:52,849-Speed 3287.28 samples/sec Loss 1.7577 LearningRate 0.0048 Epoch: 15 Global Step: 193930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:32:55,948-Speed 3306.26 samples/sec Loss 1.7891 LearningRate 0.0048 Epoch: 15 Global Step: 193940 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:32:59,068-Speed 3282.83 samples/sec Loss 1.7679 LearningRate 0.0048 Epoch: 15 Global Step: 193950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:33:02,192-Speed 3279.05 samples/sec Loss 1.7781 LearningRate 0.0048 Epoch: 15 Global Step: 193960 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:33:05,438-Speed 3155.69 samples/sec Loss 1.7749 LearningRate 0.0048 Epoch: 15 Global Step: 193970 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:33:08,516-Speed 3327.90 samples/sec Loss 1.7358 LearningRate 0.0048 Epoch: 15 Global Step: 193980 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:33:11,644-Speed 3274.10 samples/sec Loss 1.7636 LearningRate 0.0048 Epoch: 15 Global Step: 193990 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:33:14,861-Speed 3184.22 samples/sec Loss 1.8029 LearningRate 0.0048 Epoch: 15 Global Step: 194000 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:33:18,073-Speed 3188.72 samples/sec Loss 1.6925 LearningRate 0.0048 Epoch: 15 Global Step: 194010 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:33:21,143-Speed 3336.68 samples/sec Loss 1.7447 LearningRate 0.0048 Epoch: 15 Global Step: 194020 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:33:24,241-Speed 3306.41 samples/sec Loss 1.7618 LearningRate 0.0048 Epoch: 15 Global Step: 194030 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:33:27,386-Speed 3257.48 samples/sec Loss 1.7799 LearningRate 0.0048 Epoch: 15 Global Step: 194040 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:33:30,576-Speed 3210.88 samples/sec Loss 1.7250 LearningRate 0.0048 Epoch: 15 Global Step: 194050 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:33:33,671-Speed 3309.64 samples/sec Loss 1.7633 LearningRate 0.0048 Epoch: 15 Global Step: 194060 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:33:36,816-Speed 3256.53 samples/sec Loss 1.7442 LearningRate 0.0048 Epoch: 15 Global Step: 194070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:33:39,906-Speed 3315.13 samples/sec Loss 1.7643 LearningRate 0.0048 Epoch: 15 Global Step: 194080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:33:43,130-Speed 3176.81 samples/sec Loss 1.7660 LearningRate 0.0048 Epoch: 15 Global Step: 194090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:33:46,241-Speed 3293.27 samples/sec Loss 1.7592 LearningRate 0.0048 Epoch: 15 Global Step: 194100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:33:49,399-Speed 3243.28 samples/sec Loss 1.7516 LearningRate 0.0048 Epoch: 15 Global Step: 194110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:33:52,542-Speed 3258.80 samples/sec Loss 1.7535 LearningRate 0.0048 Epoch: 15 Global Step: 194120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:33:55,642-Speed 3304.87 samples/sec Loss 1.7594 LearningRate 0.0048 Epoch: 15 Global Step: 194130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:33:58,706-Speed 3342.83 samples/sec Loss 1.7906 LearningRate 0.0048 Epoch: 15 Global Step: 194140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:01,831-Speed 3277.08 samples/sec Loss 1.7671 LearningRate 0.0048 Epoch: 15 Global Step: 194150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:04,969-Speed 3264.12 samples/sec Loss 1.7444 LearningRate 0.0048 Epoch: 15 Global Step: 194160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:08,031-Speed 3345.79 samples/sec Loss 1.7352 LearningRate 0.0048 Epoch: 15 Global Step: 194170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:34:11,089-Speed 3349.40 samples/sec Loss 1.7478 LearningRate 0.0048 Epoch: 15 Global Step: 194180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:34:14,181-Speed 3313.09 samples/sec Loss 1.7372 LearningRate 0.0048 Epoch: 15 Global Step: 194190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:17,293-Speed 3291.44 samples/sec Loss 1.7111 LearningRate 0.0048 Epoch: 15 Global Step: 194200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:20,371-Speed 3328.41 samples/sec Loss 1.7315 LearningRate 0.0048 Epoch: 15 Global Step: 194210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:23,474-Speed 3300.27 samples/sec Loss 1.7975 LearningRate 0.0048 Epoch: 15 Global Step: 194220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:26,565-Speed 3314.05 samples/sec Loss 1.7468 LearningRate 0.0048 Epoch: 15 Global Step: 194230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:29,688-Speed 3280.09 samples/sec Loss 1.7594 LearningRate 0.0048 Epoch: 15 Global Step: 194240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:32,792-Speed 3299.84 samples/sec Loss 1.7581 LearningRate 0.0048 Epoch: 15 Global Step: 194250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:35,919-Speed 3276.07 samples/sec Loss 1.7856 LearningRate 0.0048 Epoch: 15 Global Step: 194260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:39,122-Speed 3197.98 samples/sec Loss 1.7477 LearningRate 0.0048 Epoch: 15 Global Step: 194270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:42,244-Speed 3281.48 samples/sec Loss 1.7618 LearningRate 0.0047 Epoch: 15 Global Step: 194280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:45,334-Speed 3315.13 samples/sec Loss 1.7891 LearningRate 0.0047 Epoch: 15 Global Step: 194290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:34:48,386-Speed 3355.75 samples/sec Loss 1.7586 LearningRate 0.0047 Epoch: 15 Global Step: 194300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:34:51,547-Speed 3240.86 samples/sec Loss 1.7879 LearningRate 0.0047 Epoch: 15 Global Step: 194310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:54,713-Speed 3235.17 samples/sec Loss 1.7591 LearningRate 0.0047 Epoch: 15 Global Step: 194320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:34:57,837-Speed 3279.35 samples/sec Loss 1.7372 LearningRate 0.0047 Epoch: 15 Global Step: 194330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:00,993-Speed 3245.76 samples/sec Loss 1.7564 LearningRate 0.0047 Epoch: 15 Global Step: 194340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:04,142-Speed 3253.31 samples/sec Loss 1.7829 LearningRate 0.0047 Epoch: 15 Global Step: 194350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:07,278-Speed 3265.51 samples/sec Loss 1.8028 LearningRate 0.0047 Epoch: 15 Global Step: 194360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:10,384-Speed 3298.45 samples/sec Loss 1.7315 LearningRate 0.0047 Epoch: 15 Global Step: 194370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:13,499-Speed 3288.74 samples/sec Loss 1.7503 LearningRate 0.0047 Epoch: 15 Global Step: 194380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:16,588-Speed 3316.33 samples/sec Loss 1.6951 LearningRate 0.0047 Epoch: 15 Global Step: 194390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:19,693-Speed 3297.78 samples/sec Loss 1.6933 LearningRate 0.0047 Epoch: 15 Global Step: 194400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:22,797-Speed 3301.00 samples/sec Loss 1.8167 LearningRate 0.0047 Epoch: 15 Global Step: 194410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:35:25,858-Speed 3346.16 samples/sec Loss 1.7432 LearningRate 0.0047 Epoch: 15 Global Step: 194420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:28,983-Speed 3277.46 samples/sec Loss 1.7742 LearningRate 0.0047 Epoch: 15 Global Step: 194430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:32,072-Speed 3315.96 samples/sec Loss 1.7744 LearningRate 0.0047 Epoch: 15 Global Step: 194440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:35,191-Speed 3284.03 samples/sec Loss 1.7515 LearningRate 0.0047 Epoch: 15 Global Step: 194450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:38,269-Speed 3328.29 samples/sec Loss 1.7677 LearningRate 0.0047 Epoch: 15 Global Step: 194460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:41,345-Speed 3330.20 samples/sec Loss 1.7949 LearningRate 0.0047 Epoch: 15 Global Step: 194470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:44,406-Speed 3346.33 samples/sec Loss 1.7958 LearningRate 0.0047 Epoch: 15 Global Step: 194480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:47,527-Speed 3282.63 samples/sec Loss 1.7749 LearningRate 0.0047 Epoch: 15 Global Step: 194490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:50,602-Speed 3331.17 samples/sec Loss 1.7679 LearningRate 0.0047 Epoch: 15 Global Step: 194500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:53,798-Speed 3204.99 samples/sec Loss 1.8079 LearningRate 0.0047 Epoch: 15 Global Step: 194510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:35:56,868-Speed 3336.92 samples/sec Loss 1.7364 LearningRate 0.0047 Epoch: 15 Global Step: 194520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:35:59,963-Speed 3309.30 samples/sec Loss 1.7434 LearningRate 0.0047 Epoch: 15 Global Step: 194530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:36:03,087-Speed 3279.40 samples/sec Loss 1.8077 LearningRate 0.0047 Epoch: 15 Global Step: 194540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:36:06,217-Speed 3272.79 samples/sec Loss 1.7616 LearningRate 0.0047 Epoch: 15 Global Step: 194550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:36:09,275-Speed 3349.07 samples/sec Loss 1.8285 LearningRate 0.0047 Epoch: 15 Global Step: 194560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:36:12,432-Speed 3244.91 samples/sec Loss 1.7536 LearningRate 0.0047 Epoch: 15 Global Step: 194570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:36:15,530-Speed 3306.73 samples/sec Loss 1.8027 LearningRate 0.0047 Epoch: 15 Global Step: 194580 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:36:18,592-Speed 3344.88 samples/sec Loss 1.7542 LearningRate 0.0047 Epoch: 15 Global Step: 194590 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:36:21,655-Speed 3344.11 samples/sec Loss 1.7590 LearningRate 0.0047 Epoch: 15 Global Step: 194600 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:36:24,743-Speed 3317.62 samples/sec Loss 1.7733 LearningRate 0.0047 Epoch: 15 Global Step: 194610 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:36:27,848-Speed 3298.83 samples/sec Loss 1.7990 LearningRate 0.0047 Epoch: 15 Global Step: 194620 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:36:30,944-Speed 3308.74 samples/sec Loss 1.8237 LearningRate 0.0047 Epoch: 15 Global Step: 194630 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:36:34,036-Speed 3313.26 samples/sec Loss 1.6719 LearningRate 0.0047 Epoch: 15 Global Step: 194640 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:36:37,152-Speed 3286.89 samples/sec Loss 1.7551 LearningRate 0.0047 Epoch: 15 Global Step: 194650 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:36:40,241-Speed 3315.95 samples/sec Loss 1.7764 LearningRate 0.0047 Epoch: 15 Global Step: 194660 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:36:43,390-Speed 3252.80 samples/sec Loss 1.7181 LearningRate 0.0047 Epoch: 15 Global Step: 194670 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:36:46,495-Speed 3299.47 samples/sec Loss 1.7502 LearningRate 0.0047 Epoch: 15 Global Step: 194680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:36:49,545-Speed 3358.01 samples/sec Loss 1.8380 LearningRate 0.0047 Epoch: 15 Global Step: 194690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:36:52,622-Speed 3329.81 samples/sec Loss 1.7805 LearningRate 0.0047 Epoch: 15 Global Step: 194700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:36:55,725-Speed 3300.25 samples/sec Loss 1.7760 LearningRate 0.0047 Epoch: 15 Global Step: 194710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:36:58,761-Speed 3374.54 samples/sec Loss 1.6580 LearningRate 0.0047 Epoch: 15 Global Step: 194720 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:37:01,885-Speed 3277.97 samples/sec Loss 1.7245 LearningRate 0.0047 Epoch: 15 Global Step: 194730 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:37:04,978-Speed 3312.07 samples/sec Loss 1.7297 LearningRate 0.0047 Epoch: 15 Global Step: 194740 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:37:08,068-Speed 3315.43 samples/sec Loss 1.7222 LearningRate 0.0047 Epoch: 15 Global Step: 194750 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:37:11,183-Speed 3288.05 samples/sec Loss 1.7261 LearningRate 0.0047 Epoch: 15 Global Step: 194760 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:37:14,333-Speed 3251.68 samples/sec Loss 1.7597 LearningRate 0.0047 Epoch: 15 Global Step: 194770 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:37:17,453-Speed 3283.94 samples/sec Loss 1.8328 LearningRate 0.0047 Epoch: 15 Global Step: 194780 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:37:20,558-Speed 3298.21 samples/sec Loss 1.7571 LearningRate 0.0047 Epoch: 15 Global Step: 194790 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:37:23,638-Speed 3325.88 samples/sec Loss 1.7753 LearningRate 0.0047 Epoch: 15 Global Step: 194800 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:37:26,728-Speed 3315.09 samples/sec Loss 1.7118 LearningRate 0.0047 Epoch: 15 Global Step: 194810 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:37:29,850-Speed 3281.23 samples/sec Loss 1.8022 LearningRate 0.0047 Epoch: 15 Global Step: 194820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:37:32,972-Speed 3280.66 samples/sec Loss 1.7753 LearningRate 0.0047 Epoch: 15 Global Step: 194830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:37:36,083-Speed 3292.56 samples/sec Loss 1.7714 LearningRate 0.0047 Epoch: 15 Global Step: 194840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:37:39,190-Speed 3297.04 samples/sec Loss 1.7585 LearningRate 0.0047 Epoch: 15 Global Step: 194850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:37:42,319-Speed 3274.31 samples/sec Loss 1.7502 LearningRate 0.0046 Epoch: 15 Global Step: 194860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:37:45,416-Speed 3307.28 samples/sec Loss 1.8265 LearningRate 0.0046 Epoch: 15 Global Step: 194870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:37:48,620-Speed 3196.55 samples/sec Loss 1.7281 LearningRate 0.0046 Epoch: 15 Global Step: 194880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:37:51,691-Speed 3336.18 samples/sec Loss 1.7586 LearningRate 0.0046 Epoch: 15 Global Step: 194890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:37:54,812-Speed 3281.54 samples/sec Loss 1.7218 LearningRate 0.0046 Epoch: 15 Global Step: 194900 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:37:57,876-Speed 3343.12 samples/sec Loss 1.7799 LearningRate 0.0046 Epoch: 15 Global Step: 194910 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:00,936-Speed 3348.16 samples/sec Loss 1.7499 LearningRate 0.0046 Epoch: 15 Global Step: 194920 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:04,077-Speed 3261.21 samples/sec Loss 1.7741 LearningRate 0.0046 Epoch: 15 Global Step: 194930 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:07,161-Speed 3321.28 samples/sec Loss 1.7634 LearningRate 0.0046 Epoch: 15 Global Step: 194940 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:10,217-Speed 3351.99 samples/sec Loss 1.7563 LearningRate 0.0046 Epoch: 15 Global Step: 194950 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:13,296-Speed 3326.35 samples/sec Loss 1.7801 LearningRate 0.0046 Epoch: 15 Global Step: 194960 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:16,426-Speed 3272.57 samples/sec Loss 1.7664 LearningRate 0.0046 Epoch: 15 Global Step: 194970 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:19,486-Speed 3347.45 samples/sec Loss 1.7276 LearningRate 0.0046 Epoch: 15 Global Step: 194980 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:22,560-Speed 3333.15 samples/sec Loss 1.7672 LearningRate 0.0046 Epoch: 15 Global Step: 194990 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:25,616-Speed 3351.56 samples/sec Loss 1.8074 LearningRate 0.0046 Epoch: 15 Global Step: 195000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:38:28,681-Speed 3342.42 samples/sec Loss 1.7453 LearningRate 0.0046 Epoch: 15 Global Step: 195010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:38:31,776-Speed 3308.86 samples/sec Loss 1.7736 LearningRate 0.0046 Epoch: 15 Global Step: 195020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:38:34,848-Speed 3334.73 samples/sec Loss 1.7587 LearningRate 0.0046 Epoch: 15 Global Step: 195030 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:37,931-Speed 3321.95 samples/sec Loss 1.7693 LearningRate 0.0046 Epoch: 15 Global Step: 195040 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:41,001-Speed 3337.04 samples/sec Loss 1.7711 LearningRate 0.0046 Epoch: 15 Global Step: 195050 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:44,117-Speed 3287.10 samples/sec Loss 1.7836 LearningRate 0.0046 Epoch: 15 Global Step: 195060 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:47,241-Speed 3279.10 samples/sec Loss 1.7124 LearningRate 0.0046 Epoch: 15 Global Step: 195070 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:50,312-Speed 3335.02 samples/sec Loss 1.7958 LearningRate 0.0046 Epoch: 15 Global Step: 195080 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:53,395-Speed 3323.12 samples/sec Loss 1.7409 LearningRate 0.0046 Epoch: 15 Global Step: 195090 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:56,478-Speed 3322.11 samples/sec Loss 1.8122 LearningRate 0.0046 Epoch: 15 Global Step: 195100 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:38:59,606-Speed 3274.83 samples/sec Loss 1.8169 LearningRate 0.0046 Epoch: 15 Global Step: 195110 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:02,678-Speed 3334.07 samples/sec Loss 1.7602 LearningRate 0.0046 Epoch: 15 Global Step: 195120 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:05,770-Speed 3313.08 samples/sec Loss 1.7557 LearningRate 0.0046 Epoch: 15 Global Step: 195130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:39:08,829-Speed 3348.91 samples/sec Loss 1.6831 LearningRate 0.0046 Epoch: 15 Global Step: 195140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:39:11,909-Speed 3325.53 samples/sec Loss 1.8094 LearningRate 0.0046 Epoch: 15 Global Step: 195150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:39:15,030-Speed 3282.50 samples/sec Loss 1.7541 LearningRate 0.0046 Epoch: 15 Global Step: 195160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:39:18,177-Speed 3254.91 samples/sec Loss 1.7514 LearningRate 0.0046 Epoch: 15 Global Step: 195170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:39:21,243-Speed 3341.34 samples/sec Loss 1.7768 LearningRate 0.0046 Epoch: 15 Global Step: 195180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:39:24,415-Speed 3228.46 samples/sec Loss 1.7635 LearningRate 0.0046 Epoch: 15 Global Step: 195190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:39:27,561-Speed 3256.94 samples/sec Loss 1.7538 LearningRate 0.0046 Epoch: 15 Global Step: 195200 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:30,691-Speed 3272.54 samples/sec Loss 1.7550 LearningRate 0.0046 Epoch: 15 Global Step: 195210 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:33,751-Speed 3346.51 samples/sec Loss 1.7461 LearningRate 0.0046 Epoch: 15 Global Step: 195220 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:36,969-Speed 3183.24 samples/sec Loss 1.7724 LearningRate 0.0046 Epoch: 15 Global Step: 195230 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:40,102-Speed 3269.76 samples/sec Loss 1.7979 LearningRate 0.0046 Epoch: 15 Global Step: 195240 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:43,268-Speed 3235.17 samples/sec Loss 1.7906 LearningRate 0.0046 Epoch: 15 Global Step: 195250 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:46,362-Speed 3311.32 samples/sec Loss 1.7535 LearningRate 0.0046 Epoch: 15 Global Step: 195260 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:49,513-Speed 3250.44 samples/sec Loss 1.7896 LearningRate 0.0046 Epoch: 15 Global Step: 195270 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:52,658-Speed 3256.73 samples/sec Loss 1.6938 LearningRate 0.0046 Epoch: 15 Global Step: 195280 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:55,731-Speed 3333.09 samples/sec Loss 1.7715 LearningRate 0.0046 Epoch: 15 Global Step: 195290 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:39:58,828-Speed 3307.90 samples/sec Loss 1.7526 LearningRate 0.0046 Epoch: 15 Global Step: 195300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:01,933-Speed 3298.56 samples/sec Loss 1.7763 LearningRate 0.0046 Epoch: 15 Global Step: 195310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:05,115-Speed 3219.10 samples/sec Loss 1.7932 LearningRate 0.0046 Epoch: 15 Global Step: 195320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:08,250-Speed 3267.09 samples/sec Loss 1.7157 LearningRate 0.0046 Epoch: 15 Global Step: 195330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:11,336-Speed 3319.13 samples/sec Loss 1.8118 LearningRate 0.0046 Epoch: 15 Global Step: 195340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:14,456-Speed 3283.38 samples/sec Loss 1.7565 LearningRate 0.0046 Epoch: 15 Global Step: 195350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:17,590-Speed 3269.26 samples/sec Loss 1.8209 LearningRate 0.0046 Epoch: 15 Global Step: 195360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:20,725-Speed 3266.92 samples/sec Loss 1.8237 LearningRate 0.0046 Epoch: 15 Global Step: 195370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:23,858-Speed 3269.94 samples/sec Loss 1.7150 LearningRate 0.0046 Epoch: 15 Global Step: 195380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:26,925-Speed 3339.73 samples/sec Loss 1.8177 LearningRate 0.0046 Epoch: 15 Global Step: 195390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:30,005-Speed 3325.02 samples/sec Loss 1.7674 LearningRate 0.0046 Epoch: 15 Global Step: 195400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:40:33,110-Speed 3299.54 samples/sec Loss 1.7920 LearningRate 0.0046 Epoch: 15 Global Step: 195410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:36,171-Speed 3346.11 samples/sec Loss 1.8425 LearningRate 0.0046 Epoch: 15 Global Step: 195420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:39,364-Speed 3209.03 samples/sec Loss 1.8088 LearningRate 0.0046 Epoch: 15 Global Step: 195430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:42,457-Speed 3311.71 samples/sec Loss 1.8150 LearningRate 0.0045 Epoch: 15 Global Step: 195440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:45,515-Speed 3349.47 samples/sec Loss 1.7365 LearningRate 0.0045 Epoch: 15 Global Step: 195450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:48,656-Speed 3260.78 samples/sec Loss 1.7680 LearningRate 0.0045 Epoch: 15 Global Step: 195460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:51,827-Speed 3230.81 samples/sec Loss 1.6787 LearningRate 0.0045 Epoch: 15 Global Step: 195470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:54,980-Speed 3248.69 samples/sec Loss 1.7449 LearningRate 0.0045 Epoch: 15 Global Step: 195480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:40:58,058-Speed 3328.23 samples/sec Loss 1.7734 LearningRate 0.0045 Epoch: 15 Global Step: 195490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:01,126-Speed 3338.64 samples/sec Loss 1.7923 LearningRate 0.0045 Epoch: 15 Global Step: 195500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:04,231-Speed 3298.28 samples/sec Loss 1.7541 LearningRate 0.0045 Epoch: 15 Global Step: 195510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:41:07,324-Speed 3312.03 samples/sec Loss 1.7355 LearningRate 0.0045 Epoch: 15 Global Step: 195520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:41:10,393-Speed 3337.32 samples/sec Loss 1.7880 LearningRate 0.0045 Epoch: 15 Global Step: 195530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:41:13,508-Speed 3288.64 samples/sec Loss 1.7762 LearningRate 0.0045 Epoch: 15 Global Step: 195540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:41:16,668-Speed 3241.88 samples/sec Loss 1.6952 LearningRate 0.0045 Epoch: 15 Global Step: 195550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:41:19,725-Speed 3350.82 samples/sec Loss 1.7885 LearningRate 0.0045 Epoch: 15 Global Step: 195560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:22,802-Speed 3329.30 samples/sec Loss 1.7393 LearningRate 0.0045 Epoch: 15 Global Step: 195570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:25,931-Speed 3273.44 samples/sec Loss 1.7455 LearningRate 0.0045 Epoch: 15 Global Step: 195580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:29,090-Speed 3242.63 samples/sec Loss 1.7479 LearningRate 0.0045 Epoch: 15 Global Step: 195590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:32,161-Speed 3335.32 samples/sec Loss 1.7553 LearningRate 0.0045 Epoch: 15 Global Step: 195600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:35,317-Speed 3245.70 samples/sec Loss 1.7328 LearningRate 0.0045 Epoch: 15 Global Step: 195610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:38,438-Speed 3282.49 samples/sec Loss 1.7009 LearningRate 0.0045 Epoch: 15 Global Step: 195620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:41,546-Speed 3295.63 samples/sec Loss 1.7489 LearningRate 0.0045 Epoch: 15 Global Step: 195630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:44,692-Speed 3255.50 samples/sec Loss 1.7597 LearningRate 0.0045 Epoch: 15 Global Step: 195640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:47,830-Speed 3264.74 samples/sec Loss 1.7672 LearningRate 0.0045 Epoch: 15 Global Step: 195650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:50,983-Speed 3248.69 samples/sec Loss 1.7547 LearningRate 0.0045 Epoch: 15 Global Step: 195660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:41:54,034-Speed 3357.93 samples/sec Loss 1.7457 LearningRate 0.0045 Epoch: 15 Global Step: 195670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:41:57,107-Speed 3333.76 samples/sec Loss 1.7502 LearningRate 0.0045 Epoch: 15 Global Step: 195680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:42:00,156-Speed 3359.64 samples/sec Loss 1.7183 LearningRate 0.0045 Epoch: 15 Global Step: 195690 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:42:03,292-Speed 3265.81 samples/sec Loss 1.6868 LearningRate 0.0045 Epoch: 15 Global Step: 195700 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:42:06,430-Speed 3265.00 samples/sec Loss 1.7225 LearningRate 0.0045 Epoch: 15 Global Step: 195710 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:42:09,549-Speed 3284.07 samples/sec Loss 1.7562 LearningRate 0.0045 Epoch: 15 Global Step: 195720 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:42:12,614-Speed 3342.04 samples/sec Loss 1.7647 LearningRate 0.0045 Epoch: 15 Global Step: 195730 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:42:15,775-Speed 3239.99 samples/sec Loss 1.7932 LearningRate 0.0045 Epoch: 15 Global Step: 195740 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:42:18,872-Speed 3307.85 samples/sec Loss 1.7261 LearningRate 0.0045 Epoch: 15 Global Step: 195750 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:42:21,938-Speed 3340.71 samples/sec Loss 1.8138 LearningRate 0.0045 Epoch: 15 Global Step: 195760 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:42:25,035-Speed 3307.66 samples/sec Loss 1.7721 LearningRate 0.0045 Epoch: 15 Global Step: 195770 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:42:28,105-Speed 3336.93 samples/sec Loss 1.7153 LearningRate 0.0045 Epoch: 15 Global Step: 195780 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-27 18:42:31,189-Speed 3321.35 samples/sec Loss 1.8090 LearningRate 0.0045 Epoch: 15 Global Step: 195790 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-27 18:42:34,245-Speed 3352.22 samples/sec Loss 1.7596 LearningRate 0.0045 Epoch: 15 Global Step: 195800 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-27 18:42:37,343-Speed 3306.20 samples/sec Loss 1.7790 LearningRate 0.0045 Epoch: 15 Global Step: 195810 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-27 18:42:40,400-Speed 3350.71 samples/sec Loss 1.7511 LearningRate 0.0045 Epoch: 15 Global Step: 195820 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-27 18:42:43,477-Speed 3328.29 samples/sec Loss 1.7470 LearningRate 0.0045 Epoch: 15 Global Step: 195830 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-27 18:42:46,578-Speed 3303.29 samples/sec Loss 1.7527 LearningRate 0.0045 Epoch: 15 Global Step: 195840 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-27 18:42:49,653-Speed 3332.00 samples/sec Loss 1.7745 LearningRate 0.0045 Epoch: 15 Global Step: 195850 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-27 18:42:52,801-Speed 3253.74 samples/sec Loss 1.7654 LearningRate 0.0045 Epoch: 15 Global Step: 195860 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-27 18:42:55,878-Speed 3328.36 samples/sec Loss 1.8294 LearningRate 0.0045 Epoch: 15 Global Step: 195870 Fp16 Grad Scale: 4096 Required: 5 hours Training: 2022-04-27 18:42:59,057-Speed 3222.64 samples/sec Loss 1.7555 LearningRate 0.0045 Epoch: 15 Global Step: 195880 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:43:02,182-Speed 3277.88 samples/sec Loss 1.7804 LearningRate 0.0045 Epoch: 15 Global Step: 195890 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:43:05,330-Speed 3253.51 samples/sec Loss 1.7379 LearningRate 0.0045 Epoch: 15 Global Step: 195900 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:43:08,479-Speed 3252.45 samples/sec Loss 1.7520 LearningRate 0.0045 Epoch: 15 Global Step: 195910 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:43:11,553-Speed 3332.95 samples/sec Loss 1.7703 LearningRate 0.0045 Epoch: 15 Global Step: 195920 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:43:14,693-Speed 3262.05 samples/sec Loss 1.7441 LearningRate 0.0045 Epoch: 15 Global Step: 195930 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:43:17,888-Speed 3206.30 samples/sec Loss 1.7505 LearningRate 0.0045 Epoch: 15 Global Step: 195940 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:43:21,003-Speed 3288.82 samples/sec Loss 1.7685 LearningRate 0.0045 Epoch: 15 Global Step: 195950 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:43:24,220-Speed 3183.94 samples/sec Loss 1.7795 LearningRate 0.0045 Epoch: 15 Global Step: 195960 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:43:27,391-Speed 3230.00 samples/sec Loss 1.8052 LearningRate 0.0045 Epoch: 15 Global Step: 195970 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:43:30,575-Speed 3217.01 samples/sec Loss 1.7557 LearningRate 0.0045 Epoch: 15 Global Step: 195980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:43:33,662-Speed 3318.66 samples/sec Loss 1.7786 LearningRate 0.0045 Epoch: 15 Global Step: 195990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:43:36,840-Speed 3222.89 samples/sec Loss 1.7729 LearningRate 0.0045 Epoch: 15 Global Step: 196000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:43:39,976-Speed 3265.85 samples/sec Loss 1.7354 LearningRate 0.0045 Epoch: 15 Global Step: 196010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:43:43,062-Speed 3319.21 samples/sec Loss 1.7386 LearningRate 0.0044 Epoch: 15 Global Step: 196020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:43:46,235-Speed 3228.95 samples/sec Loss 1.7466 LearningRate 0.0044 Epoch: 15 Global Step: 196030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:43:49,441-Speed 3194.87 samples/sec Loss 1.7853 LearningRate 0.0044 Epoch: 15 Global Step: 196040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:43:52,599-Speed 3243.17 samples/sec Loss 1.7675 LearningRate 0.0044 Epoch: 15 Global Step: 196050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:43:55,760-Speed 3240.60 samples/sec Loss 1.8274 LearningRate 0.0044 Epoch: 15 Global Step: 196060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:43:58,870-Speed 3293.95 samples/sec Loss 1.8056 LearningRate 0.0044 Epoch: 15 Global Step: 196070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:01,948-Speed 3328.07 samples/sec Loss 1.7536 LearningRate 0.0044 Epoch: 15 Global Step: 196080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:44:05,071-Speed 3280.05 samples/sec Loss 1.8111 LearningRate 0.0044 Epoch: 15 Global Step: 196090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:44:08,128-Speed 3350.47 samples/sec Loss 1.8244 LearningRate 0.0044 Epoch: 15 Global Step: 196100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:11,261-Speed 3269.31 samples/sec Loss 1.7778 LearningRate 0.0044 Epoch: 15 Global Step: 196110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:14,376-Speed 3288.86 samples/sec Loss 1.8181 LearningRate 0.0044 Epoch: 15 Global Step: 196120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:17,509-Speed 3268.70 samples/sec Loss 1.7100 LearningRate 0.0044 Epoch: 15 Global Step: 196130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:20,598-Speed 3316.08 samples/sec Loss 1.7625 LearningRate 0.0044 Epoch: 15 Global Step: 196140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:23,694-Speed 3308.83 samples/sec Loss 1.7722 LearningRate 0.0044 Epoch: 15 Global Step: 196150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:26,834-Speed 3262.65 samples/sec Loss 1.7089 LearningRate 0.0044 Epoch: 15 Global Step: 196160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:29,962-Speed 3274.31 samples/sec Loss 1.7561 LearningRate 0.0044 Epoch: 15 Global Step: 196170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:33,070-Speed 3296.11 samples/sec Loss 1.7690 LearningRate 0.0044 Epoch: 15 Global Step: 196180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:36,234-Speed 3237.12 samples/sec Loss 1.7714 LearningRate 0.0044 Epoch: 15 Global Step: 196190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:39,359-Speed 3278.03 samples/sec Loss 1.7743 LearningRate 0.0044 Epoch: 15 Global Step: 196200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:44:42,527-Speed 3233.31 samples/sec Loss 1.7983 LearningRate 0.0044 Epoch: 15 Global Step: 196210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 18:44:45,613-Speed 3319.10 samples/sec Loss 1.7840 LearningRate 0.0044 Epoch: 15 Global Step: 196220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:48,740-Speed 3275.35 samples/sec Loss 1.7617 LearningRate 0.0044 Epoch: 15 Global Step: 196230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:51,809-Speed 3338.16 samples/sec Loss 1.7397 LearningRate 0.0044 Epoch: 15 Global Step: 196240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:54,937-Speed 3274.93 samples/sec Loss 1.7636 LearningRate 0.0044 Epoch: 15 Global Step: 196250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:44:57,990-Speed 3355.00 samples/sec Loss 1.7704 LearningRate 0.0044 Epoch: 15 Global Step: 196260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 18:45:01,082-Speed 3312.83 samples/sec Loss 1.7883 LearningRate 0.0044 Epoch: 15 Global Step: 196270 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:45:04,182-Speed 3304.45 samples/sec Loss 1.7379 LearningRate 0.0044 Epoch: 15 Global Step: 196280 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:45:07,277-Speed 3309.05 samples/sec Loss 1.7186 LearningRate 0.0044 Epoch: 15 Global Step: 196290 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:45:10,421-Speed 3257.78 samples/sec Loss 1.7753 LearningRate 0.0044 Epoch: 15 Global Step: 196300 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:45:13,594-Speed 3228.95 samples/sec Loss 1.7096 LearningRate 0.0044 Epoch: 15 Global Step: 196310 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:45:16,679-Speed 3320.29 samples/sec Loss 1.7464 LearningRate 0.0044 Epoch: 15 Global Step: 196320 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:45:19,781-Speed 3301.44 samples/sec Loss 1.7832 LearningRate 0.0044 Epoch: 15 Global Step: 196330 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:45:22,848-Speed 3340.01 samples/sec Loss 1.7934 LearningRate 0.0044 Epoch: 15 Global Step: 196340 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:45:25,924-Speed 3330.27 samples/sec Loss 1.7912 LearningRate 0.0044 Epoch: 15 Global Step: 196350 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:45:29,015-Speed 3313.56 samples/sec Loss 1.7780 LearningRate 0.0044 Epoch: 15 Global Step: 196360 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 18:45:32,136-Speed 3282.62 samples/sec Loss 1.7404 LearningRate 0.0044 Epoch: 15 Global Step: 196370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:45:35,335-Speed 3202.31 samples/sec Loss 1.7822 LearningRate 0.0044 Epoch: 15 Global Step: 196380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:45:38,394-Speed 3349.22 samples/sec Loss 1.7592 LearningRate 0.0044 Epoch: 15 Global Step: 196390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:45:41,581-Speed 3213.12 samples/sec Loss 1.7448 LearningRate 0.0044 Epoch: 15 Global Step: 196400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:45:44,646-Speed 3343.07 samples/sec Loss 1.7969 LearningRate 0.0044 Epoch: 15 Global Step: 196410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:45:47,788-Speed 3260.11 samples/sec Loss 1.7677 LearningRate 0.0044 Epoch: 15 Global Step: 196420 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:45:50,878-Speed 3314.19 samples/sec Loss 1.7527 LearningRate 0.0044 Epoch: 15 Global Step: 196430 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:45:54,022-Speed 3258.71 samples/sec Loss 1.7593 LearningRate 0.0044 Epoch: 15 Global Step: 196440 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:45:57,700-Speed 2784.54 samples/sec Loss 1.7549 LearningRate 0.0044 Epoch: 15 Global Step: 196450 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:46:00,771-Speed 3335.73 samples/sec Loss 1.7726 LearningRate 0.0044 Epoch: 15 Global Step: 196460 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:46:03,976-Speed 3195.70 samples/sec Loss 1.6865 LearningRate 0.0044 Epoch: 15 Global Step: 196470 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:46:07,192-Speed 3185.93 samples/sec Loss 1.7309 LearningRate 0.0044 Epoch: 15 Global Step: 196480 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:46:10,311-Speed 3283.73 samples/sec Loss 1.7236 LearningRate 0.0044 Epoch: 15 Global Step: 196490 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:46:13,422-Speed 3292.28 samples/sec Loss 1.7452 LearningRate 0.0044 Epoch: 15 Global Step: 196500 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:46:16,488-Speed 3341.59 samples/sec Loss 1.7688 LearningRate 0.0044 Epoch: 15 Global Step: 196510 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:46:19,581-Speed 3311.22 samples/sec Loss 1.7966 LearningRate 0.0044 Epoch: 15 Global Step: 196520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:46:22,678-Speed 3307.52 samples/sec Loss 1.7350 LearningRate 0.0044 Epoch: 15 Global Step: 196530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:46:25,771-Speed 3311.87 samples/sec Loss 1.8149 LearningRate 0.0044 Epoch: 15 Global Step: 196540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:46:28,849-Speed 3327.86 samples/sec Loss 1.7718 LearningRate 0.0044 Epoch: 15 Global Step: 196550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:46:31,903-Speed 3354.47 samples/sec Loss 1.7888 LearningRate 0.0044 Epoch: 15 Global Step: 196560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:46:35,017-Speed 3289.16 samples/sec Loss 1.7775 LearningRate 0.0044 Epoch: 15 Global Step: 196570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:46:38,129-Speed 3291.90 samples/sec Loss 1.7298 LearningRate 0.0044 Epoch: 15 Global Step: 196580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:46:41,251-Speed 3280.90 samples/sec Loss 1.7841 LearningRate 0.0044 Epoch: 15 Global Step: 196590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:46:44,306-Speed 3352.51 samples/sec Loss 1.7808 LearningRate 0.0044 Epoch: 15 Global Step: 196600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:46:47,392-Speed 3319.88 samples/sec Loss 1.8134 LearningRate 0.0043 Epoch: 15 Global Step: 196610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:46:50,534-Speed 3259.29 samples/sec Loss 1.7744 LearningRate 0.0043 Epoch: 15 Global Step: 196620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:46:53,649-Speed 3288.91 samples/sec Loss 1.8112 LearningRate 0.0043 Epoch: 15 Global Step: 196630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:46:56,751-Speed 3302.31 samples/sec Loss 1.7960 LearningRate 0.0043 Epoch: 15 Global Step: 196640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:46:59,785-Speed 3375.84 samples/sec Loss 1.7387 LearningRate 0.0043 Epoch: 15 Global Step: 196650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:47:02,865-Speed 3326.09 samples/sec Loss 1.7524 LearningRate 0.0043 Epoch: 15 Global Step: 196660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:47:05,966-Speed 3302.42 samples/sec Loss 1.8024 LearningRate 0.0043 Epoch: 15 Global Step: 196670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:47:09,044-Speed 3328.34 samples/sec Loss 1.8030 LearningRate 0.0043 Epoch: 15 Global Step: 196680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:47:12,105-Speed 3346.34 samples/sec Loss 1.7532 LearningRate 0.0043 Epoch: 15 Global Step: 196690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:47:15,201-Speed 3309.08 samples/sec Loss 1.7281 LearningRate 0.0043 Epoch: 15 Global Step: 196700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:47:18,252-Speed 3356.77 samples/sec Loss 1.7530 LearningRate 0.0043 Epoch: 15 Global Step: 196710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:47:21,334-Speed 3323.72 samples/sec Loss 1.7363 LearningRate 0.0043 Epoch: 15 Global Step: 196720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:47:24,422-Speed 3316.90 samples/sec Loss 1.7622 LearningRate 0.0043 Epoch: 15 Global Step: 196730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:47:27,537-Speed 3288.29 samples/sec Loss 1.7896 LearningRate 0.0043 Epoch: 15 Global Step: 196740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:47:31,182-Speed 2810.29 samples/sec Loss 1.7784 LearningRate 0.0043 Epoch: 15 Global Step: 196750 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:47:34,245-Speed 3344.72 samples/sec Loss 1.7058 LearningRate 0.0043 Epoch: 15 Global Step: 196760 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:47:37,976-Speed 2744.91 samples/sec Loss 1.7453 LearningRate 0.0043 Epoch: 15 Global Step: 196770 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:47:43,537-Speed 1841.69 samples/sec Loss 1.7647 LearningRate 0.0043 Epoch: 15 Global Step: 196780 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:47:46,608-Speed 3335.66 samples/sec Loss 1.7741 LearningRate 0.0043 Epoch: 15 Global Step: 196790 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:47:49,691-Speed 3323.12 samples/sec Loss 1.7986 LearningRate 0.0043 Epoch: 15 Global Step: 196800 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:47:52,776-Speed 3319.47 samples/sec Loss 1.7833 LearningRate 0.0043 Epoch: 15 Global Step: 196810 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:47:55,966-Speed 3211.11 samples/sec Loss 1.7254 LearningRate 0.0043 Epoch: 15 Global Step: 196820 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:47:59,058-Speed 3313.20 samples/sec Loss 1.7397 LearningRate 0.0043 Epoch: 15 Global Step: 196830 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:02,138-Speed 3326.33 samples/sec Loss 1.8250 LearningRate 0.0043 Epoch: 15 Global Step: 196840 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:05,277-Speed 3263.16 samples/sec Loss 1.7883 LearningRate 0.0043 Epoch: 15 Global Step: 196850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:48:08,376-Speed 3305.05 samples/sec Loss 1.7881 LearningRate 0.0043 Epoch: 15 Global Step: 196860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:48:11,468-Speed 3313.01 samples/sec Loss 1.7708 LearningRate 0.0043 Epoch: 15 Global Step: 196870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:48:14,575-Speed 3296.89 samples/sec Loss 1.7534 LearningRate 0.0043 Epoch: 15 Global Step: 196880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:48:17,724-Speed 3252.98 samples/sec Loss 1.6955 LearningRate 0.0043 Epoch: 15 Global Step: 196890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:48:20,801-Speed 3329.15 samples/sec Loss 1.7290 LearningRate 0.0043 Epoch: 15 Global Step: 196900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:48:23,842-Speed 3367.92 samples/sec Loss 1.7397 LearningRate 0.0043 Epoch: 15 Global Step: 196910 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:26,939-Speed 3307.84 samples/sec Loss 1.7759 LearningRate 0.0043 Epoch: 15 Global Step: 196920 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:30,034-Speed 3309.10 samples/sec Loss 1.8072 LearningRate 0.0043 Epoch: 15 Global Step: 196930 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:33,160-Speed 3277.42 samples/sec Loss 1.7468 LearningRate 0.0043 Epoch: 15 Global Step: 196940 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:36,311-Speed 3250.55 samples/sec Loss 1.7451 LearningRate 0.0043 Epoch: 15 Global Step: 196950 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:39,431-Speed 3283.92 samples/sec Loss 1.7593 LearningRate 0.0043 Epoch: 15 Global Step: 196960 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:42,540-Speed 3294.93 samples/sec Loss 1.7524 LearningRate 0.0043 Epoch: 15 Global Step: 196970 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:45,611-Speed 3335.18 samples/sec Loss 1.6986 LearningRate 0.0043 Epoch: 15 Global Step: 196980 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:48,738-Speed 3276.41 samples/sec Loss 1.7800 LearningRate 0.0043 Epoch: 15 Global Step: 196990 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:51,883-Speed 3256.99 samples/sec Loss 1.7942 LearningRate 0.0043 Epoch: 15 Global Step: 197000 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:48:55,030-Speed 3254.94 samples/sec Loss 1.7620 LearningRate 0.0043 Epoch: 15 Global Step: 197010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:48:58,087-Speed 3350.71 samples/sec Loss 1.7895 LearningRate 0.0043 Epoch: 15 Global Step: 197020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:01,274-Speed 3213.71 samples/sec Loss 1.7596 LearningRate 0.0043 Epoch: 15 Global Step: 197030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:04,483-Speed 3192.09 samples/sec Loss 1.8161 LearningRate 0.0043 Epoch: 15 Global Step: 197040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:07,632-Speed 3253.37 samples/sec Loss 1.7804 LearningRate 0.0043 Epoch: 15 Global Step: 197050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:10,720-Speed 3316.99 samples/sec Loss 1.7117 LearningRate 0.0043 Epoch: 15 Global Step: 197060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:13,880-Speed 3241.35 samples/sec Loss 1.7728 LearningRate 0.0043 Epoch: 15 Global Step: 197070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:16,989-Speed 3294.86 samples/sec Loss 1.7631 LearningRate 0.0043 Epoch: 15 Global Step: 197080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:20,120-Speed 3271.42 samples/sec Loss 1.7783 LearningRate 0.0043 Epoch: 15 Global Step: 197090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:23,251-Speed 3271.86 samples/sec Loss 1.7740 LearningRate 0.0043 Epoch: 15 Global Step: 197100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:26,373-Speed 3280.95 samples/sec Loss 1.7688 LearningRate 0.0043 Epoch: 15 Global Step: 197110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:49:29,473-Speed 3303.91 samples/sec Loss 1.8078 LearningRate 0.0043 Epoch: 15 Global Step: 197120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:49:32,584-Speed 3292.51 samples/sec Loss 1.7336 LearningRate 0.0043 Epoch: 15 Global Step: 197130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:49:35,683-Speed 3305.65 samples/sec Loss 1.7533 LearningRate 0.0043 Epoch: 15 Global Step: 197140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:49:38,746-Speed 3344.75 samples/sec Loss 1.7719 LearningRate 0.0043 Epoch: 15 Global Step: 197150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:41,906-Speed 3241.35 samples/sec Loss 1.7921 LearningRate 0.0043 Epoch: 15 Global Step: 197160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:45,033-Speed 3275.27 samples/sec Loss 1.7952 LearningRate 0.0043 Epoch: 15 Global Step: 197170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:48,188-Speed 3246.80 samples/sec Loss 1.7666 LearningRate 0.0043 Epoch: 15 Global Step: 197180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:51,367-Speed 3222.10 samples/sec Loss 1.7131 LearningRate 0.0043 Epoch: 15 Global Step: 197190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:54,512-Speed 3257.39 samples/sec Loss 1.7621 LearningRate 0.0043 Epoch: 15 Global Step: 197200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:49:57,598-Speed 3319.37 samples/sec Loss 1.7480 LearningRate 0.0042 Epoch: 15 Global Step: 197210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:50:00,790-Speed 3208.55 samples/sec Loss 1.7270 LearningRate 0.0042 Epoch: 15 Global Step: 197220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:50:03,957-Speed 3234.37 samples/sec Loss 1.7273 LearningRate 0.0042 Epoch: 15 Global Step: 197230 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:50:07,128-Speed 3230.04 samples/sec Loss 1.7906 LearningRate 0.0042 Epoch: 15 Global Step: 197240 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:50:10,219-Speed 3313.90 samples/sec Loss 1.7707 LearningRate 0.0042 Epoch: 15 Global Step: 197250 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:50:13,351-Speed 3271.79 samples/sec Loss 1.7548 LearningRate 0.0042 Epoch: 15 Global Step: 197260 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:50:16,498-Speed 3254.16 samples/sec Loss 1.8143 LearningRate 0.0042 Epoch: 15 Global Step: 197270 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:50:19,679-Speed 3220.53 samples/sec Loss 1.7546 LearningRate 0.0042 Epoch: 15 Global Step: 197280 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:50:22,769-Speed 3314.78 samples/sec Loss 1.8002 LearningRate 0.0042 Epoch: 15 Global Step: 197290 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:50:25,919-Speed 3251.67 samples/sec Loss 1.7156 LearningRate 0.0042 Epoch: 15 Global Step: 197300 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:50:29,026-Speed 3297.73 samples/sec Loss 1.7709 LearningRate 0.0042 Epoch: 15 Global Step: 197310 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:50:32,128-Speed 3302.02 samples/sec Loss 1.8211 LearningRate 0.0042 Epoch: 15 Global Step: 197320 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:50:35,218-Speed 3314.35 samples/sec Loss 1.7592 LearningRate 0.0042 Epoch: 15 Global Step: 197330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:50:38,321-Speed 3301.39 samples/sec Loss 1.7105 LearningRate 0.0042 Epoch: 15 Global Step: 197340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:50:41,458-Speed 3265.43 samples/sec Loss 1.6810 LearningRate 0.0042 Epoch: 15 Global Step: 197350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:50:44,576-Speed 3286.00 samples/sec Loss 1.7463 LearningRate 0.0042 Epoch: 15 Global Step: 197360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:50:47,741-Speed 3236.37 samples/sec Loss 1.7688 LearningRate 0.0042 Epoch: 15 Global Step: 197370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:50:50,883-Speed 3258.93 samples/sec Loss 1.7856 LearningRate 0.0042 Epoch: 15 Global Step: 197380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:50:54,077-Speed 3207.45 samples/sec Loss 1.7879 LearningRate 0.0042 Epoch: 15 Global Step: 197390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:50:57,169-Speed 3313.07 samples/sec Loss 1.8155 LearningRate 0.0042 Epoch: 15 Global Step: 197400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:00,313-Speed 3257.44 samples/sec Loss 1.7730 LearningRate 0.0042 Epoch: 15 Global Step: 197410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:03,451-Speed 3264.49 samples/sec Loss 1.7448 LearningRate 0.0042 Epoch: 15 Global Step: 197420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:06,656-Speed 3196.31 samples/sec Loss 1.7732 LearningRate 0.0042 Epoch: 15 Global Step: 197430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:51:09,750-Speed 3310.47 samples/sec Loss 1.7819 LearningRate 0.0042 Epoch: 15 Global Step: 197440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:51:12,916-Speed 3235.97 samples/sec Loss 1.7502 LearningRate 0.0042 Epoch: 15 Global Step: 197450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:51:16,134-Speed 3182.36 samples/sec Loss 1.7497 LearningRate 0.0042 Epoch: 15 Global Step: 197460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:51:19,209-Speed 3330.78 samples/sec Loss 1.7998 LearningRate 0.0042 Epoch: 15 Global Step: 197470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:51:22,284-Speed 3331.90 samples/sec Loss 1.8123 LearningRate 0.0042 Epoch: 15 Global Step: 197480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:25,381-Speed 3307.90 samples/sec Loss 1.7923 LearningRate 0.0042 Epoch: 15 Global Step: 197490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:28,447-Speed 3339.97 samples/sec Loss 1.7450 LearningRate 0.0042 Epoch: 15 Global Step: 197500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:31,554-Speed 3297.19 samples/sec Loss 1.8042 LearningRate 0.0042 Epoch: 15 Global Step: 197510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:34,682-Speed 3274.82 samples/sec Loss 1.7339 LearningRate 0.0042 Epoch: 15 Global Step: 197520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:37,859-Speed 3224.15 samples/sec Loss 1.7266 LearningRate 0.0042 Epoch: 15 Global Step: 197530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:40,964-Speed 3298.70 samples/sec Loss 1.7267 LearningRate 0.0042 Epoch: 15 Global Step: 197540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:44,113-Speed 3252.86 samples/sec Loss 1.7628 LearningRate 0.0042 Epoch: 15 Global Step: 197550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:47,228-Speed 3288.98 samples/sec Loss 1.6731 LearningRate 0.0042 Epoch: 15 Global Step: 197560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:50,372-Speed 3258.21 samples/sec Loss 1.7711 LearningRate 0.0042 Epoch: 15 Global Step: 197570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:51:53,452-Speed 3325.22 samples/sec Loss 1.7823 LearningRate 0.0042 Epoch: 15 Global Step: 197580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:51:56,587-Speed 3267.61 samples/sec Loss 1.7667 LearningRate 0.0042 Epoch: 15 Global Step: 197590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:51:59,659-Speed 3334.75 samples/sec Loss 1.7972 LearningRate 0.0042 Epoch: 15 Global Step: 197600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:02,764-Speed 3298.39 samples/sec Loss 1.7136 LearningRate 0.0042 Epoch: 15 Global Step: 197610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:05,853-Speed 3315.79 samples/sec Loss 1.7425 LearningRate 0.0042 Epoch: 15 Global Step: 197620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:08,938-Speed 3320.81 samples/sec Loss 1.7430 LearningRate 0.0042 Epoch: 15 Global Step: 197630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:12,106-Speed 3233.44 samples/sec Loss 1.7327 LearningRate 0.0042 Epoch: 15 Global Step: 197640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:15,269-Speed 3238.53 samples/sec Loss 1.8425 LearningRate 0.0042 Epoch: 15 Global Step: 197650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:18,364-Speed 3309.01 samples/sec Loss 1.8012 LearningRate 0.0042 Epoch: 15 Global Step: 197660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:21,487-Speed 3279.82 samples/sec Loss 1.8002 LearningRate 0.0042 Epoch: 15 Global Step: 197670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:24,611-Speed 3279.07 samples/sec Loss 1.7700 LearningRate 0.0042 Epoch: 15 Global Step: 197680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:27,739-Speed 3275.71 samples/sec Loss 1.7278 LearningRate 0.0042 Epoch: 15 Global Step: 197690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:30,917-Speed 3222.65 samples/sec Loss 1.7540 LearningRate 0.0042 Epoch: 15 Global Step: 197700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:34,008-Speed 3313.78 samples/sec Loss 1.7635 LearningRate 0.0042 Epoch: 15 Global Step: 197710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:37,161-Speed 3249.37 samples/sec Loss 1.7881 LearningRate 0.0042 Epoch: 15 Global Step: 197720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:40,297-Speed 3266.03 samples/sec Loss 1.8310 LearningRate 0.0042 Epoch: 15 Global Step: 197730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:43,410-Speed 3290.73 samples/sec Loss 1.7344 LearningRate 0.0042 Epoch: 15 Global Step: 197740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:46,495-Speed 3319.85 samples/sec Loss 1.7747 LearningRate 0.0042 Epoch: 15 Global Step: 197750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:49,591-Speed 3308.95 samples/sec Loss 1.7671 LearningRate 0.0042 Epoch: 15 Global Step: 197760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:52,701-Speed 3293.12 samples/sec Loss 1.7586 LearningRate 0.0042 Epoch: 15 Global Step: 197770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:55,822-Speed 3282.80 samples/sec Loss 1.7169 LearningRate 0.0042 Epoch: 15 Global Step: 197780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:52:58,888-Speed 3341.03 samples/sec Loss 1.7673 LearningRate 0.0042 Epoch: 15 Global Step: 197790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:02,000-Speed 3290.38 samples/sec Loss 1.7566 LearningRate 0.0042 Epoch: 15 Global Step: 197800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:53:05,142-Speed 3260.72 samples/sec Loss 1.7350 LearningRate 0.0042 Epoch: 15 Global Step: 197810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:08,225-Speed 3322.31 samples/sec Loss 1.7537 LearningRate 0.0041 Epoch: 15 Global Step: 197820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:11,389-Speed 3237.64 samples/sec Loss 1.7892 LearningRate 0.0041 Epoch: 15 Global Step: 197830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:14,500-Speed 3292.30 samples/sec Loss 1.7535 LearningRate 0.0041 Epoch: 15 Global Step: 197840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:17,633-Speed 3270.32 samples/sec Loss 1.8275 LearningRate 0.0041 Epoch: 15 Global Step: 197850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:20,716-Speed 3321.97 samples/sec Loss 1.7629 LearningRate 0.0041 Epoch: 15 Global Step: 197860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:23,819-Speed 3300.99 samples/sec Loss 1.7504 LearningRate 0.0041 Epoch: 15 Global Step: 197870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:27,007-Speed 3213.13 samples/sec Loss 1.7718 LearningRate 0.0041 Epoch: 15 Global Step: 197880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:30,197-Speed 3210.78 samples/sec Loss 1.7839 LearningRate 0.0041 Epoch: 15 Global Step: 197890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:33,265-Speed 3338.77 samples/sec Loss 1.7163 LearningRate 0.0041 Epoch: 15 Global Step: 197900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:36,405-Speed 3262.03 samples/sec Loss 1.7654 LearningRate 0.0041 Epoch: 15 Global Step: 197910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:53:39,580-Speed 3226.16 samples/sec Loss 1.8131 LearningRate 0.0041 Epoch: 15 Global Step: 197920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:42,838-Speed 3144.43 samples/sec Loss 1.7580 LearningRate 0.0041 Epoch: 15 Global Step: 197930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:45,909-Speed 3335.88 samples/sec Loss 1.7521 LearningRate 0.0041 Epoch: 15 Global Step: 197940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:49,012-Speed 3300.37 samples/sec Loss 1.7367 LearningRate 0.0041 Epoch: 15 Global Step: 197950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:52,132-Speed 3282.50 samples/sec Loss 1.7626 LearningRate 0.0041 Epoch: 15 Global Step: 197960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:55,200-Speed 3339.62 samples/sec Loss 1.7050 LearningRate 0.0041 Epoch: 15 Global Step: 197970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:53:58,305-Speed 3298.55 samples/sec Loss 1.8025 LearningRate 0.0041 Epoch: 15 Global Step: 197980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:01,502-Speed 3204.28 samples/sec Loss 1.7861 LearningRate 0.0041 Epoch: 15 Global Step: 197990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:04,603-Speed 3302.57 samples/sec Loss 1.6997 LearningRate 0.0041 Epoch: 15 Global Step: 198000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:07,731-Speed 3275.66 samples/sec Loss 1.7445 LearningRate 0.0041 Epoch: 15 Global Step: 198010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:10,838-Speed 3295.69 samples/sec Loss 1.7936 LearningRate 0.0041 Epoch: 15 Global Step: 198020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:54:13,945-Speed 3297.77 samples/sec Loss 1.8272 LearningRate 0.0041 Epoch: 15 Global Step: 198030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:17,056-Speed 3291.82 samples/sec Loss 1.7440 LearningRate 0.0041 Epoch: 15 Global Step: 198040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:20,144-Speed 3317.15 samples/sec Loss 1.8386 LearningRate 0.0041 Epoch: 15 Global Step: 198050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:23,200-Speed 3352.33 samples/sec Loss 1.7492 LearningRate 0.0041 Epoch: 15 Global Step: 198060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:26,279-Speed 3326.15 samples/sec Loss 1.7827 LearningRate 0.0041 Epoch: 15 Global Step: 198070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:29,391-Speed 3291.30 samples/sec Loss 1.6923 LearningRate 0.0041 Epoch: 15 Global Step: 198080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:32,451-Speed 3347.69 samples/sec Loss 1.7578 LearningRate 0.0041 Epoch: 15 Global Step: 198090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:35,513-Speed 3347.25 samples/sec Loss 1.8020 LearningRate 0.0041 Epoch: 15 Global Step: 198100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:38,585-Speed 3334.60 samples/sec Loss 1.8150 LearningRate 0.0041 Epoch: 15 Global Step: 198110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:41,663-Speed 3328.39 samples/sec Loss 1.7762 LearningRate 0.0041 Epoch: 15 Global Step: 198120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:44,745-Speed 3324.06 samples/sec Loss 1.7797 LearningRate 0.0041 Epoch: 15 Global Step: 198130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:47,878-Speed 3269.51 samples/sec Loss 1.7542 LearningRate 0.0041 Epoch: 15 Global Step: 198140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:51,001-Speed 3279.26 samples/sec Loss 1.7551 LearningRate 0.0041 Epoch: 15 Global Step: 198150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:54,114-Speed 3290.68 samples/sec Loss 1.7575 LearningRate 0.0041 Epoch: 15 Global Step: 198160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:54:57,171-Speed 3350.37 samples/sec Loss 1.7327 LearningRate 0.0041 Epoch: 15 Global Step: 198170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:55:00,262-Speed 3313.97 samples/sec Loss 1.6990 LearningRate 0.0041 Epoch: 15 Global Step: 198180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:55:03,404-Speed 3260.49 samples/sec Loss 1.7515 LearningRate 0.0041 Epoch: 15 Global Step: 198190 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:06,550-Speed 3255.61 samples/sec Loss 1.7770 LearningRate 0.0041 Epoch: 15 Global Step: 198200 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:09,601-Speed 3357.24 samples/sec Loss 1.6991 LearningRate 0.0041 Epoch: 15 Global Step: 198210 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:12,771-Speed 3231.53 samples/sec Loss 1.7656 LearningRate 0.0041 Epoch: 15 Global Step: 198220 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:15,822-Speed 3357.07 samples/sec Loss 1.7039 LearningRate 0.0041 Epoch: 15 Global Step: 198230 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:18,903-Speed 3325.11 samples/sec Loss 1.7571 LearningRate 0.0041 Epoch: 15 Global Step: 198240 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:21,958-Speed 3352.47 samples/sec Loss 1.7542 LearningRate 0.0041 Epoch: 15 Global Step: 198250 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:25,042-Speed 3321.18 samples/sec Loss 1.8141 LearningRate 0.0041 Epoch: 15 Global Step: 198260 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:28,184-Speed 3260.41 samples/sec Loss 1.7841 LearningRate 0.0041 Epoch: 15 Global Step: 198270 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:31,299-Speed 3288.21 samples/sec Loss 1.7499 LearningRate 0.0041 Epoch: 15 Global Step: 198280 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:34,375-Speed 3330.32 samples/sec Loss 1.7978 LearningRate 0.0041 Epoch: 15 Global Step: 198290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:55:37,453-Speed 3329.58 samples/sec Loss 1.6566 LearningRate 0.0041 Epoch: 15 Global Step: 198300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:55:40,577-Speed 3278.29 samples/sec Loss 1.7694 LearningRate 0.0041 Epoch: 15 Global Step: 198310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:55:43,708-Speed 3272.30 samples/sec Loss 1.7694 LearningRate 0.0041 Epoch: 15 Global Step: 198320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:55:46,796-Speed 3316.96 samples/sec Loss 1.7678 LearningRate 0.0041 Epoch: 15 Global Step: 198330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:55:49,955-Speed 3241.97 samples/sec Loss 1.7805 LearningRate 0.0041 Epoch: 15 Global Step: 198340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:55:53,078-Speed 3280.47 samples/sec Loss 1.7469 LearningRate 0.0041 Epoch: 15 Global Step: 198350 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:56,187-Speed 3294.44 samples/sec Loss 1.7157 LearningRate 0.0041 Epoch: 15 Global Step: 198360 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:55:59,256-Speed 3337.85 samples/sec Loss 1.7979 LearningRate 0.0041 Epoch: 15 Global Step: 198370 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:56:02,326-Speed 3336.58 samples/sec Loss 1.7718 LearningRate 0.0041 Epoch: 15 Global Step: 198380 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:56:05,411-Speed 3320.13 samples/sec Loss 1.7897 LearningRate 0.0041 Epoch: 15 Global Step: 198390 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:56:08,482-Speed 3335.51 samples/sec Loss 1.7409 LearningRate 0.0041 Epoch: 15 Global Step: 198400 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:56:11,599-Speed 3286.07 samples/sec Loss 1.8025 LearningRate 0.0041 Epoch: 15 Global Step: 198410 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:56:14,690-Speed 3314.34 samples/sec Loss 1.7846 LearningRate 0.0041 Epoch: 15 Global Step: 198420 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:56:17,779-Speed 3315.81 samples/sec Loss 1.6927 LearningRate 0.0040 Epoch: 15 Global Step: 198430 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:56:20,888-Speed 3294.55 samples/sec Loss 1.6961 LearningRate 0.0040 Epoch: 15 Global Step: 198440 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 18:56:24,026-Speed 3264.65 samples/sec Loss 1.7289 LearningRate 0.0040 Epoch: 15 Global Step: 198450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:56:27,148-Speed 3281.15 samples/sec Loss 1.7272 LearningRate 0.0040 Epoch: 15 Global Step: 198460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:56:30,313-Speed 3236.19 samples/sec Loss 1.7225 LearningRate 0.0040 Epoch: 15 Global Step: 198470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:56:33,378-Speed 3341.95 samples/sec Loss 1.6860 LearningRate 0.0040 Epoch: 15 Global Step: 198480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:56:36,545-Speed 3234.87 samples/sec Loss 1.7173 LearningRate 0.0040 Epoch: 15 Global Step: 198490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:56:39,604-Speed 3347.60 samples/sec Loss 1.7481 LearningRate 0.0040 Epoch: 15 Global Step: 198500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:56:42,683-Speed 3328.04 samples/sec Loss 1.7367 LearningRate 0.0040 Epoch: 15 Global Step: 198510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:56:45,750-Speed 3339.26 samples/sec Loss 1.8026 LearningRate 0.0040 Epoch: 15 Global Step: 198520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:56:48,859-Speed 3295.75 samples/sec Loss 1.7199 LearningRate 0.0040 Epoch: 15 Global Step: 198530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:56:51,961-Speed 3301.28 samples/sec Loss 1.7550 LearningRate 0.0040 Epoch: 15 Global Step: 198540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:56:55,035-Speed 3332.34 samples/sec Loss 1.6932 LearningRate 0.0040 Epoch: 15 Global Step: 198550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:56:58,142-Speed 3296.96 samples/sec Loss 1.7170 LearningRate 0.0040 Epoch: 15 Global Step: 198560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:57:01,204-Speed 3345.71 samples/sec Loss 1.8109 LearningRate 0.0040 Epoch: 15 Global Step: 198570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:57:04,359-Speed 3246.17 samples/sec Loss 1.7901 LearningRate 0.0040 Epoch: 15 Global Step: 198580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:07,422-Speed 3344.53 samples/sec Loss 1.7927 LearningRate 0.0040 Epoch: 15 Global Step: 198590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:10,468-Speed 3362.27 samples/sec Loss 1.7774 LearningRate 0.0040 Epoch: 15 Global Step: 198600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:13,547-Speed 3327.60 samples/sec Loss 1.7462 LearningRate 0.0040 Epoch: 15 Global Step: 198610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:16,669-Speed 3280.41 samples/sec Loss 1.7541 LearningRate 0.0040 Epoch: 15 Global Step: 198620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:19,755-Speed 3319.79 samples/sec Loss 1.7374 LearningRate 0.0040 Epoch: 15 Global Step: 198630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:22,887-Speed 3270.39 samples/sec Loss 1.7717 LearningRate 0.0040 Epoch: 15 Global Step: 198640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:26,047-Speed 3241.78 samples/sec Loss 1.7826 LearningRate 0.0040 Epoch: 15 Global Step: 198650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:29,137-Speed 3314.49 samples/sec Loss 1.7750 LearningRate 0.0040 Epoch: 15 Global Step: 198660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:32,218-Speed 3324.71 samples/sec Loss 1.7172 LearningRate 0.0040 Epoch: 15 Global Step: 198670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:35,342-Speed 3279.07 samples/sec Loss 1.7674 LearningRate 0.0040 Epoch: 15 Global Step: 198680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:57:38,403-Speed 3346.09 samples/sec Loss 1.7835 LearningRate 0.0040 Epoch: 15 Global Step: 198690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:41,475-Speed 3334.12 samples/sec Loss 1.7607 LearningRate 0.0040 Epoch: 15 Global Step: 198700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:44,562-Speed 3318.99 samples/sec Loss 1.7474 LearningRate 0.0040 Epoch: 15 Global Step: 198710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:47,669-Speed 3296.91 samples/sec Loss 1.7844 LearningRate 0.0040 Epoch: 15 Global Step: 198720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:57:50,941-Speed 3130.63 samples/sec Loss 1.7282 LearningRate 0.0040 Epoch: 15 Global Step: 198730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:58:22,736-Speed 322.09 samples/sec Loss 1.5482 LearningRate 0.0040 Epoch: 16 Global Step: 198740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:58:26,313-Speed 2863.61 samples/sec Loss 1.3278 LearningRate 0.0040 Epoch: 16 Global Step: 198750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:58:29,508-Speed 3205.39 samples/sec Loss 1.3362 LearningRate 0.0040 Epoch: 16 Global Step: 198760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:58:32,600-Speed 3314.23 samples/sec Loss 1.3548 LearningRate 0.0040 Epoch: 16 Global Step: 198770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:58:35,682-Speed 3323.11 samples/sec Loss 1.2549 LearningRate 0.0040 Epoch: 16 Global Step: 198780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:58:38,848-Speed 3235.40 samples/sec Loss 1.2861 LearningRate 0.0040 Epoch: 16 Global Step: 198790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:58:41,919-Speed 3334.80 samples/sec Loss 1.3060 LearningRate 0.0040 Epoch: 16 Global Step: 198800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:58:44,994-Speed 3332.03 samples/sec Loss 1.2763 LearningRate 0.0040 Epoch: 16 Global Step: 198810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:58:48,073-Speed 3326.95 samples/sec Loss 1.3100 LearningRate 0.0040 Epoch: 16 Global Step: 198820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:58:51,136-Speed 3344.15 samples/sec Loss 1.2675 LearningRate 0.0040 Epoch: 16 Global Step: 198830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:58:54,208-Speed 3334.21 samples/sec Loss 1.2710 LearningRate 0.0040 Epoch: 16 Global Step: 198840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:58:57,276-Speed 3339.27 samples/sec Loss 1.2705 LearningRate 0.0040 Epoch: 16 Global Step: 198850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:00,457-Speed 3219.82 samples/sec Loss 1.3113 LearningRate 0.0040 Epoch: 16 Global Step: 198860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:03,627-Speed 3231.61 samples/sec Loss 1.2666 LearningRate 0.0040 Epoch: 16 Global Step: 198870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:06,728-Speed 3303.31 samples/sec Loss 1.2935 LearningRate 0.0040 Epoch: 16 Global Step: 198880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:09,804-Speed 3329.47 samples/sec Loss 1.2864 LearningRate 0.0040 Epoch: 16 Global Step: 198890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:12,900-Speed 3309.19 samples/sec Loss 1.2689 LearningRate 0.0040 Epoch: 16 Global Step: 198900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:16,194-Speed 3109.41 samples/sec Loss 1.2443 LearningRate 0.0040 Epoch: 16 Global Step: 198910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:19,395-Speed 3200.09 samples/sec Loss 1.2406 LearningRate 0.0040 Epoch: 16 Global Step: 198920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:22,667-Speed 3130.14 samples/sec Loss 1.2856 LearningRate 0.0040 Epoch: 16 Global Step: 198930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:25,884-Speed 3184.62 samples/sec Loss 1.2875 LearningRate 0.0040 Epoch: 16 Global Step: 198940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:29,129-Speed 3156.47 samples/sec Loss 1.2929 LearningRate 0.0040 Epoch: 16 Global Step: 198950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:32,213-Speed 3321.35 samples/sec Loss 1.3014 LearningRate 0.0040 Epoch: 16 Global Step: 198960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:35,275-Speed 3345.99 samples/sec Loss 1.2890 LearningRate 0.0040 Epoch: 16 Global Step: 198970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:38,381-Speed 3297.89 samples/sec Loss 1.2953 LearningRate 0.0040 Epoch: 16 Global Step: 198980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:41,553-Speed 3228.79 samples/sec Loss 1.2703 LearningRate 0.0040 Epoch: 16 Global Step: 198990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:44,617-Speed 3342.94 samples/sec Loss 1.2692 LearningRate 0.0040 Epoch: 16 Global Step: 199000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 18:59:47,675-Speed 3350.59 samples/sec Loss 1.2754 LearningRate 0.0040 Epoch: 16 Global Step: 199010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:50,735-Speed 3346.65 samples/sec Loss 1.2818 LearningRate 0.0040 Epoch: 16 Global Step: 199020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:53,797-Speed 3345.46 samples/sec Loss 1.3163 LearningRate 0.0040 Epoch: 16 Global Step: 199030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 18:59:56,906-Speed 3294.68 samples/sec Loss 1.3059 LearningRate 0.0040 Epoch: 16 Global Step: 199040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:00,021-Speed 3288.13 samples/sec Loss 1.3297 LearningRate 0.0039 Epoch: 16 Global Step: 199050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:03,131-Speed 3293.76 samples/sec Loss 1.2853 LearningRate 0.0039 Epoch: 16 Global Step: 199060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:06,281-Speed 3251.86 samples/sec Loss 1.2673 LearningRate 0.0039 Epoch: 16 Global Step: 199070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:09,374-Speed 3311.85 samples/sec Loss 1.3228 LearningRate 0.0039 Epoch: 16 Global Step: 199080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:12,482-Speed 3296.02 samples/sec Loss 1.2651 LearningRate 0.0039 Epoch: 16 Global Step: 199090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:15,658-Speed 3225.09 samples/sec Loss 1.2545 LearningRate 0.0039 Epoch: 16 Global Step: 199100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:18,736-Speed 3328.04 samples/sec Loss 1.2544 LearningRate 0.0039 Epoch: 16 Global Step: 199110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:00:21,773-Speed 3372.73 samples/sec Loss 1.3102 LearningRate 0.0039 Epoch: 16 Global Step: 199120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:24,951-Speed 3223.35 samples/sec Loss 1.2492 LearningRate 0.0039 Epoch: 16 Global Step: 199130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:28,092-Speed 3261.55 samples/sec Loss 1.3039 LearningRate 0.0039 Epoch: 16 Global Step: 199140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:31,175-Speed 3321.88 samples/sec Loss 1.2348 LearningRate 0.0039 Epoch: 16 Global Step: 199150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:34,261-Speed 3320.52 samples/sec Loss 1.2488 LearningRate 0.0039 Epoch: 16 Global Step: 199160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:37,411-Speed 3251.08 samples/sec Loss 1.3230 LearningRate 0.0039 Epoch: 16 Global Step: 199170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:40,516-Speed 3299.81 samples/sec Loss 1.3155 LearningRate 0.0039 Epoch: 16 Global Step: 199180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:43,651-Speed 3267.47 samples/sec Loss 1.2984 LearningRate 0.0039 Epoch: 16 Global Step: 199190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:46,740-Speed 3315.88 samples/sec Loss 1.2900 LearningRate 0.0039 Epoch: 16 Global Step: 199200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:49,848-Speed 3295.67 samples/sec Loss 1.3079 LearningRate 0.0039 Epoch: 16 Global Step: 199210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:52,946-Speed 3306.34 samples/sec Loss 1.2767 LearningRate 0.0039 Epoch: 16 Global Step: 199220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:00:55,994-Speed 3361.02 samples/sec Loss 1.2600 LearningRate 0.0039 Epoch: 16 Global Step: 199230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:00:59,059-Speed 3341.58 samples/sec Loss 1.3493 LearningRate 0.0039 Epoch: 16 Global Step: 199240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:02,179-Speed 3283.69 samples/sec Loss 1.2892 LearningRate 0.0039 Epoch: 16 Global Step: 199250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:05,294-Speed 3288.39 samples/sec Loss 1.2456 LearningRate 0.0039 Epoch: 16 Global Step: 199260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:08,372-Speed 3328.17 samples/sec Loss 1.2430 LearningRate 0.0039 Epoch: 16 Global Step: 199270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:11,468-Speed 3308.73 samples/sec Loss 1.2627 LearningRate 0.0039 Epoch: 16 Global Step: 199280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:14,530-Speed 3344.88 samples/sec Loss 1.2795 LearningRate 0.0039 Epoch: 16 Global Step: 199290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:17,628-Speed 3306.42 samples/sec Loss 1.2897 LearningRate 0.0039 Epoch: 16 Global Step: 199300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:20,692-Speed 3343.31 samples/sec Loss 1.2446 LearningRate 0.0039 Epoch: 16 Global Step: 199310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:23,758-Speed 3340.77 samples/sec Loss 1.3204 LearningRate 0.0039 Epoch: 16 Global Step: 199320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:26,863-Speed 3299.21 samples/sec Loss 1.3642 LearningRate 0.0039 Epoch: 16 Global Step: 199330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:29,993-Speed 3272.70 samples/sec Loss 1.3071 LearningRate 0.0039 Epoch: 16 Global Step: 199340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:33,103-Speed 3293.36 samples/sec Loss 1.3087 LearningRate 0.0039 Epoch: 16 Global Step: 199350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:01:36,297-Speed 3207.39 samples/sec Loss 1.3111 LearningRate 0.0039 Epoch: 16 Global Step: 199360 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:01:39,406-Speed 3295.01 samples/sec Loss 1.2961 LearningRate 0.0039 Epoch: 16 Global Step: 199370 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:01:42,568-Speed 3239.05 samples/sec Loss 1.2625 LearningRate 0.0039 Epoch: 16 Global Step: 199380 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:01:45,691-Speed 3280.32 samples/sec Loss 1.2909 LearningRate 0.0039 Epoch: 16 Global Step: 199390 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:01:48,806-Speed 3288.06 samples/sec Loss 1.2859 LearningRate 0.0039 Epoch: 16 Global Step: 199400 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:01:51,933-Speed 3275.24 samples/sec Loss 1.3441 LearningRate 0.0039 Epoch: 16 Global Step: 199410 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:01:55,038-Speed 3299.42 samples/sec Loss 1.2912 LearningRate 0.0039 Epoch: 16 Global Step: 199420 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:01:58,090-Speed 3355.86 samples/sec Loss 1.3234 LearningRate 0.0039 Epoch: 16 Global Step: 199430 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:01,196-Speed 3298.82 samples/sec Loss 1.2973 LearningRate 0.0039 Epoch: 16 Global Step: 199440 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:04,261-Speed 3341.39 samples/sec Loss 1.2901 LearningRate 0.0039 Epoch: 16 Global Step: 199450 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:07,340-Speed 3327.49 samples/sec Loss 1.3178 LearningRate 0.0039 Epoch: 16 Global Step: 199460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:02:10,429-Speed 3316.26 samples/sec Loss 1.3234 LearningRate 0.0039 Epoch: 16 Global Step: 199470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:02:13,538-Speed 3293.62 samples/sec Loss 1.3260 LearningRate 0.0039 Epoch: 16 Global Step: 199480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:02:16,612-Speed 3332.52 samples/sec Loss 1.2979 LearningRate 0.0039 Epoch: 16 Global Step: 199490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:02:19,695-Speed 3322.77 samples/sec Loss 1.3374 LearningRate 0.0039 Epoch: 16 Global Step: 199500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:02:22,781-Speed 3319.44 samples/sec Loss 1.2724 LearningRate 0.0039 Epoch: 16 Global Step: 199510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:02:25,848-Speed 3340.07 samples/sec Loss 1.3072 LearningRate 0.0039 Epoch: 16 Global Step: 199520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:02:28,901-Speed 3355.03 samples/sec Loss 1.3506 LearningRate 0.0039 Epoch: 16 Global Step: 199530 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:31,973-Speed 3334.56 samples/sec Loss 1.2941 LearningRate 0.0039 Epoch: 16 Global Step: 199540 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:35,087-Speed 3289.77 samples/sec Loss 1.3588 LearningRate 0.0039 Epoch: 16 Global Step: 199550 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:38,240-Speed 3248.81 samples/sec Loss 1.3396 LearningRate 0.0039 Epoch: 16 Global Step: 199560 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:41,328-Speed 3316.48 samples/sec Loss 1.2938 LearningRate 0.0039 Epoch: 16 Global Step: 199570 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:44,429-Speed 3302.95 samples/sec Loss 1.3577 LearningRate 0.0039 Epoch: 16 Global Step: 199580 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:47,543-Speed 3290.01 samples/sec Loss 1.3012 LearningRate 0.0039 Epoch: 16 Global Step: 199590 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:50,669-Speed 3276.71 samples/sec Loss 1.3177 LearningRate 0.0039 Epoch: 16 Global Step: 199600 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:53,845-Speed 3225.66 samples/sec Loss 1.2818 LearningRate 0.0039 Epoch: 16 Global Step: 199610 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:02:56,930-Speed 3320.45 samples/sec Loss 1.2721 LearningRate 0.0039 Epoch: 16 Global Step: 199620 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:03:00,004-Speed 3331.94 samples/sec Loss 1.2445 LearningRate 0.0039 Epoch: 16 Global Step: 199630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:03:03,141-Speed 3265.11 samples/sec Loss 1.2676 LearningRate 0.0039 Epoch: 16 Global Step: 199640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:03:06,214-Speed 3333.27 samples/sec Loss 1.2918 LearningRate 0.0039 Epoch: 16 Global Step: 199650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:03:09,279-Speed 3342.34 samples/sec Loss 1.2580 LearningRate 0.0039 Epoch: 16 Global Step: 199660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:03:12,371-Speed 3312.69 samples/sec Loss 1.2726 LearningRate 0.0039 Epoch: 16 Global Step: 199670 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:03:15,474-Speed 3301.74 samples/sec Loss 1.3054 LearningRate 0.0038 Epoch: 16 Global Step: 199680 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:03:18,590-Speed 3287.61 samples/sec Loss 1.2811 LearningRate 0.0038 Epoch: 16 Global Step: 199690 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:03:21,677-Speed 3317.41 samples/sec Loss 1.2568 LearningRate 0.0038 Epoch: 16 Global Step: 199700 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:03:24,776-Speed 3305.42 samples/sec Loss 1.3155 LearningRate 0.0038 Epoch: 16 Global Step: 199710 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:03:27,837-Speed 3347.02 samples/sec Loss 1.3068 LearningRate 0.0038 Epoch: 16 Global Step: 199720 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:03:30,928-Speed 3313.16 samples/sec Loss 1.3157 LearningRate 0.0038 Epoch: 16 Global Step: 199730 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:03:34,042-Speed 3290.42 samples/sec Loss 1.3134 LearningRate 0.0038 Epoch: 16 Global Step: 199740 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:03:37,126-Speed 3321.41 samples/sec Loss 1.2960 LearningRate 0.0038 Epoch: 16 Global Step: 199750 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:03:40,250-Speed 3278.06 samples/sec Loss 1.2871 LearningRate 0.0038 Epoch: 16 Global Step: 199760 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:03:43,328-Speed 3328.29 samples/sec Loss 1.3058 LearningRate 0.0038 Epoch: 16 Global Step: 199770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:03:46,394-Speed 3341.13 samples/sec Loss 1.2724 LearningRate 0.0038 Epoch: 16 Global Step: 199780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:03:49,538-Speed 3258.43 samples/sec Loss 1.3028 LearningRate 0.0038 Epoch: 16 Global Step: 199790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:03:52,624-Speed 3319.59 samples/sec Loss 1.2975 LearningRate 0.0038 Epoch: 16 Global Step: 199800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:03:55,730-Speed 3296.95 samples/sec Loss 1.3056 LearningRate 0.0038 Epoch: 16 Global Step: 199810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:03:58,798-Speed 3339.67 samples/sec Loss 1.3038 LearningRate 0.0038 Epoch: 16 Global Step: 199820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:01,859-Speed 3346.02 samples/sec Loss 1.2922 LearningRate 0.0038 Epoch: 16 Global Step: 199830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:04,921-Speed 3345.32 samples/sec Loss 1.3402 LearningRate 0.0038 Epoch: 16 Global Step: 199840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:08,000-Speed 3326.62 samples/sec Loss 1.3603 LearningRate 0.0038 Epoch: 16 Global Step: 199850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:11,137-Speed 3265.27 samples/sec Loss 1.2927 LearningRate 0.0038 Epoch: 16 Global Step: 199860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:14,203-Speed 3341.59 samples/sec Loss 1.3013 LearningRate 0.0038 Epoch: 16 Global Step: 199870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:04:17,278-Speed 3330.92 samples/sec Loss 1.3296 LearningRate 0.0038 Epoch: 16 Global Step: 199880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:20,350-Speed 3333.51 samples/sec Loss 1.3075 LearningRate 0.0038 Epoch: 16 Global Step: 199890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:23,417-Speed 3340.10 samples/sec Loss 1.2986 LearningRate 0.0038 Epoch: 16 Global Step: 199900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:26,480-Speed 3344.49 samples/sec Loss 1.2550 LearningRate 0.0038 Epoch: 16 Global Step: 199910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:29,579-Speed 3305.30 samples/sec Loss 1.2869 LearningRate 0.0038 Epoch: 16 Global Step: 199920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:32,659-Speed 3325.55 samples/sec Loss 1.2749 LearningRate 0.0038 Epoch: 16 Global Step: 199930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:35,717-Speed 3349.22 samples/sec Loss 1.2403 LearningRate 0.0038 Epoch: 16 Global Step: 199940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:38,776-Speed 3348.56 samples/sec Loss 1.3323 LearningRate 0.0038 Epoch: 16 Global Step: 199950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:41,852-Speed 3330.84 samples/sec Loss 1.2957 LearningRate 0.0038 Epoch: 16 Global Step: 199960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:44,935-Speed 3322.59 samples/sec Loss 1.2717 LearningRate 0.0038 Epoch: 16 Global Step: 199970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:47,994-Speed 3347.67 samples/sec Loss 1.3225 LearningRate 0.0038 Epoch: 16 Global Step: 199980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:51,070-Speed 3329.87 samples/sec Loss 1.2781 LearningRate 0.0038 Epoch: 16 Global Step: 199990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:54,172-Speed 3302.13 samples/sec Loss 1.2844 LearningRate 0.0038 Epoch: 16 Global Step: 200000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:04:57,234-Speed 3345.24 samples/sec Loss 1.2609 LearningRate 0.0038 Epoch: 16 Global Step: 200010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:00,331-Speed 3307.84 samples/sec Loss 1.3107 LearningRate 0.0038 Epoch: 16 Global Step: 200020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:03,452-Speed 3282.02 samples/sec Loss 1.3117 LearningRate 0.0038 Epoch: 16 Global Step: 200030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:06,653-Speed 3199.83 samples/sec Loss 1.3152 LearningRate 0.0038 Epoch: 16 Global Step: 200040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:09,726-Speed 3333.25 samples/sec Loss 1.2642 LearningRate 0.0038 Epoch: 16 Global Step: 200050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:12,793-Speed 3340.75 samples/sec Loss 1.3397 LearningRate 0.0038 Epoch: 16 Global Step: 200060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:15,886-Speed 3311.58 samples/sec Loss 1.3736 LearningRate 0.0038 Epoch: 16 Global Step: 200070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:18,985-Speed 3305.10 samples/sec Loss 1.3249 LearningRate 0.0038 Epoch: 16 Global Step: 200080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:22,057-Speed 3333.81 samples/sec Loss 1.3165 LearningRate 0.0038 Epoch: 16 Global Step: 200090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:25,119-Speed 3345.26 samples/sec Loss 1.3297 LearningRate 0.0038 Epoch: 16 Global Step: 200100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:28,209-Speed 3315.14 samples/sec Loss 1.3388 LearningRate 0.0038 Epoch: 16 Global Step: 200110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:31,266-Speed 3351.60 samples/sec Loss 1.3228 LearningRate 0.0038 Epoch: 16 Global Step: 200120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:34,340-Speed 3331.62 samples/sec Loss 1.2984 LearningRate 0.0038 Epoch: 16 Global Step: 200130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:05:37,444-Speed 3299.92 samples/sec Loss 1.3299 LearningRate 0.0038 Epoch: 16 Global Step: 200140 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:05:40,613-Speed 3232.91 samples/sec Loss 1.3408 LearningRate 0.0038 Epoch: 16 Global Step: 200150 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:05:43,672-Speed 3348.46 samples/sec Loss 1.3458 LearningRate 0.0038 Epoch: 16 Global Step: 200160 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:05:46,744-Speed 3336.80 samples/sec Loss 1.3251 LearningRate 0.0038 Epoch: 16 Global Step: 200170 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:05:49,832-Speed 3317.38 samples/sec Loss 1.3286 LearningRate 0.0038 Epoch: 16 Global Step: 200180 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:05:53,037-Speed 3195.63 samples/sec Loss 1.3733 LearningRate 0.0038 Epoch: 16 Global Step: 200190 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:05:56,115-Speed 3327.41 samples/sec Loss 1.3017 LearningRate 0.0038 Epoch: 16 Global Step: 200200 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:05:59,213-Speed 3306.64 samples/sec Loss 1.3353 LearningRate 0.0038 Epoch: 16 Global Step: 200210 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:02,410-Speed 3204.07 samples/sec Loss 1.3867 LearningRate 0.0038 Epoch: 16 Global Step: 200220 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:05,601-Speed 3209.64 samples/sec Loss 1.3125 LearningRate 0.0038 Epoch: 16 Global Step: 200230 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:08,684-Speed 3323.29 samples/sec Loss 1.3691 LearningRate 0.0038 Epoch: 16 Global Step: 200240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:06:11,734-Speed 3358.01 samples/sec Loss 1.3135 LearningRate 0.0038 Epoch: 16 Global Step: 200250 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:15,732-Speed 2561.72 samples/sec Loss 1.3126 LearningRate 0.0038 Epoch: 16 Global Step: 200260 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:18,879-Speed 3255.05 samples/sec Loss 1.3152 LearningRate 0.0038 Epoch: 16 Global Step: 200270 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:21,954-Speed 3331.12 samples/sec Loss 1.2735 LearningRate 0.0038 Epoch: 16 Global Step: 200280 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:25,074-Speed 3282.88 samples/sec Loss 1.3038 LearningRate 0.0038 Epoch: 16 Global Step: 200290 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:28,258-Speed 3217.65 samples/sec Loss 1.3242 LearningRate 0.0038 Epoch: 16 Global Step: 200300 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:31,330-Speed 3334.17 samples/sec Loss 1.3350 LearningRate 0.0038 Epoch: 16 Global Step: 200310 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:34,445-Speed 3288.54 samples/sec Loss 1.3158 LearningRate 0.0037 Epoch: 16 Global Step: 200320 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:37,576-Speed 3271.17 samples/sec Loss 1.3267 LearningRate 0.0037 Epoch: 16 Global Step: 200330 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:40,697-Speed 3282.44 samples/sec Loss 1.2940 LearningRate 0.0037 Epoch: 16 Global Step: 200340 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:06:43,790-Speed 3311.43 samples/sec Loss 1.2895 LearningRate 0.0037 Epoch: 16 Global Step: 200350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:06:46,917-Speed 3276.02 samples/sec Loss 1.2921 LearningRate 0.0037 Epoch: 16 Global Step: 200360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:06:50,086-Speed 3232.13 samples/sec Loss 1.2857 LearningRate 0.0037 Epoch: 16 Global Step: 200370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:06:53,251-Speed 3236.68 samples/sec Loss 1.2983 LearningRate 0.0037 Epoch: 16 Global Step: 200380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:06:56,366-Speed 3288.25 samples/sec Loss 1.3333 LearningRate 0.0037 Epoch: 16 Global Step: 200390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:06:59,517-Speed 3251.75 samples/sec Loss 1.3640 LearningRate 0.0037 Epoch: 16 Global Step: 200400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:07:02,620-Speed 3300.37 samples/sec Loss 1.3188 LearningRate 0.0037 Epoch: 16 Global Step: 200410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:07:05,677-Speed 3350.41 samples/sec Loss 1.3264 LearningRate 0.0037 Epoch: 16 Global Step: 200420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:07:08,735-Speed 3350.51 samples/sec Loss 1.3558 LearningRate 0.0037 Epoch: 16 Global Step: 200430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:07:11,836-Speed 3302.41 samples/sec Loss 1.3367 LearningRate 0.0037 Epoch: 16 Global Step: 200440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:07:14,922-Speed 3319.26 samples/sec Loss 1.3134 LearningRate 0.0037 Epoch: 16 Global Step: 200450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:07:18,045-Speed 3280.35 samples/sec Loss 1.3238 LearningRate 0.0037 Epoch: 16 Global Step: 200460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:07:21,099-Speed 3353.72 samples/sec Loss 1.3411 LearningRate 0.0037 Epoch: 16 Global Step: 200470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:07:24,172-Speed 3333.96 samples/sec Loss 1.2824 LearningRate 0.0037 Epoch: 16 Global Step: 200480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:07:27,293-Speed 3281.88 samples/sec Loss 1.3097 LearningRate 0.0037 Epoch: 16 Global Step: 200490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:07:30,485-Speed 3209.48 samples/sec Loss 1.3136 LearningRate 0.0037 Epoch: 16 Global Step: 200500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:07:33,585-Speed 3304.32 samples/sec Loss 1.2954 LearningRate 0.0037 Epoch: 16 Global Step: 200510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:07:36,812-Speed 3173.68 samples/sec Loss 1.3464 LearningRate 0.0037 Epoch: 16 Global Step: 200520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:07:40,007-Speed 3206.23 samples/sec Loss 1.3105 LearningRate 0.0037 Epoch: 16 Global Step: 200530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:07:43,159-Speed 3250.13 samples/sec Loss 1.3232 LearningRate 0.0037 Epoch: 16 Global Step: 200540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:07:46,212-Speed 3355.11 samples/sec Loss 1.3262 LearningRate 0.0037 Epoch: 16 Global Step: 200550 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:07:49,366-Speed 3247.64 samples/sec Loss 1.3049 LearningRate 0.0037 Epoch: 16 Global Step: 200560 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:07:52,566-Speed 3201.22 samples/sec Loss 1.2958 LearningRate 0.0037 Epoch: 16 Global Step: 200570 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:07:55,706-Speed 3262.30 samples/sec Loss 1.3757 LearningRate 0.0037 Epoch: 16 Global Step: 200580 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:07:58,788-Speed 3323.98 samples/sec Loss 1.3489 LearningRate 0.0037 Epoch: 16 Global Step: 200590 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:08:01,898-Speed 3293.06 samples/sec Loss 1.3400 LearningRate 0.0037 Epoch: 16 Global Step: 200600 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:08:04,993-Speed 3309.34 samples/sec Loss 1.3010 LearningRate 0.0037 Epoch: 16 Global Step: 200610 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:08:08,071-Speed 3328.69 samples/sec Loss 1.3133 LearningRate 0.0037 Epoch: 16 Global Step: 200620 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:08:11,232-Speed 3240.71 samples/sec Loss 1.2665 LearningRate 0.0037 Epoch: 16 Global Step: 200630 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:08:14,341-Speed 3294.64 samples/sec Loss 1.3203 LearningRate 0.0037 Epoch: 16 Global Step: 200640 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:08:17,456-Speed 3287.49 samples/sec Loss 1.3263 LearningRate 0.0037 Epoch: 16 Global Step: 200650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:20,551-Speed 3310.50 samples/sec Loss 1.3048 LearningRate 0.0037 Epoch: 16 Global Step: 200660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:23,692-Speed 3260.51 samples/sec Loss 1.3644 LearningRate 0.0037 Epoch: 16 Global Step: 200670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:26,804-Speed 3291.65 samples/sec Loss 1.3852 LearningRate 0.0037 Epoch: 16 Global Step: 200680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:29,891-Speed 3318.50 samples/sec Loss 1.3153 LearningRate 0.0037 Epoch: 16 Global Step: 200690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:33,043-Speed 3249.35 samples/sec Loss 1.3739 LearningRate 0.0037 Epoch: 16 Global Step: 200700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:36,139-Speed 3309.21 samples/sec Loss 1.3339 LearningRate 0.0037 Epoch: 16 Global Step: 200710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:39,322-Speed 3217.74 samples/sec Loss 1.3052 LearningRate 0.0037 Epoch: 16 Global Step: 200720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:42,421-Speed 3305.82 samples/sec Loss 1.3219 LearningRate 0.0037 Epoch: 16 Global Step: 200730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:45,485-Speed 3343.34 samples/sec Loss 1.3019 LearningRate 0.0037 Epoch: 16 Global Step: 200740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:48,631-Speed 3256.07 samples/sec Loss 1.3351 LearningRate 0.0037 Epoch: 16 Global Step: 200750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:08:51,695-Speed 3343.12 samples/sec Loss 1.3297 LearningRate 0.0037 Epoch: 16 Global Step: 200760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:54,802-Speed 3296.89 samples/sec Loss 1.3549 LearningRate 0.0037 Epoch: 16 Global Step: 200770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:08:57,896-Speed 3310.43 samples/sec Loss 1.3906 LearningRate 0.0037 Epoch: 16 Global Step: 200780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:01,014-Speed 3285.19 samples/sec Loss 1.3667 LearningRate 0.0037 Epoch: 16 Global Step: 200790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:04,132-Speed 3285.51 samples/sec Loss 1.3582 LearningRate 0.0037 Epoch: 16 Global Step: 200800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:07,206-Speed 3333.74 samples/sec Loss 1.3442 LearningRate 0.0037 Epoch: 16 Global Step: 200810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:10,258-Speed 3356.11 samples/sec Loss 1.3343 LearningRate 0.0037 Epoch: 16 Global Step: 200820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:13,327-Speed 3338.09 samples/sec Loss 1.3602 LearningRate 0.0037 Epoch: 16 Global Step: 200830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:16,396-Speed 3337.07 samples/sec Loss 1.2914 LearningRate 0.0037 Epoch: 16 Global Step: 200840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:19,518-Speed 3281.60 samples/sec Loss 1.3454 LearningRate 0.0037 Epoch: 16 Global Step: 200850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:22,594-Speed 3329.23 samples/sec Loss 1.3194 LearningRate 0.0037 Epoch: 16 Global Step: 200860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:09:25,691-Speed 3308.21 samples/sec Loss 1.3234 LearningRate 0.0037 Epoch: 16 Global Step: 200870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:09:28,859-Speed 3233.15 samples/sec Loss 1.3274 LearningRate 0.0037 Epoch: 16 Global Step: 200880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:31,978-Speed 3284.70 samples/sec Loss 1.3110 LearningRate 0.0037 Epoch: 16 Global Step: 200890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:35,090-Speed 3291.06 samples/sec Loss 1.3513 LearningRate 0.0037 Epoch: 16 Global Step: 200900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:38,187-Speed 3308.03 samples/sec Loss 1.2972 LearningRate 0.0037 Epoch: 16 Global Step: 200910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:41,333-Speed 3254.95 samples/sec Loss 1.2959 LearningRate 0.0037 Epoch: 16 Global Step: 200920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:44,491-Speed 3244.23 samples/sec Loss 1.3277 LearningRate 0.0037 Epoch: 16 Global Step: 200930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:09:47,572-Speed 3324.67 samples/sec Loss 1.3326 LearningRate 0.0037 Epoch: 16 Global Step: 200940 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:09:50,684-Speed 3291.08 samples/sec Loss 1.3456 LearningRate 0.0037 Epoch: 16 Global Step: 200950 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:09:53,788-Speed 3300.64 samples/sec Loss 1.3066 LearningRate 0.0036 Epoch: 16 Global Step: 200960 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:09:56,866-Speed 3328.09 samples/sec Loss 1.3712 LearningRate 0.0036 Epoch: 16 Global Step: 200970 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:09:59,956-Speed 3314.81 samples/sec Loss 1.3155 LearningRate 0.0036 Epoch: 16 Global Step: 200980 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:10:03,041-Speed 3320.51 samples/sec Loss 1.3047 LearningRate 0.0036 Epoch: 16 Global Step: 200990 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:10:06,156-Speed 3287.98 samples/sec Loss 1.3369 LearningRate 0.0036 Epoch: 16 Global Step: 201000 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:10:09,210-Speed 3354.79 samples/sec Loss 1.3072 LearningRate 0.0036 Epoch: 16 Global Step: 201010 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:10:12,342-Speed 3270.65 samples/sec Loss 1.3243 LearningRate 0.0036 Epoch: 16 Global Step: 201020 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:10:15,440-Speed 3306.89 samples/sec Loss 1.3710 LearningRate 0.0036 Epoch: 16 Global Step: 201030 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:10:18,509-Speed 3337.66 samples/sec Loss 1.3147 LearningRate 0.0036 Epoch: 16 Global Step: 201040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:10:21,563-Speed 3353.43 samples/sec Loss 1.3435 LearningRate 0.0036 Epoch: 16 Global Step: 201050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:10:24,652-Speed 3318.49 samples/sec Loss 1.3431 LearningRate 0.0036 Epoch: 16 Global Step: 201060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:10:27,754-Speed 3301.64 samples/sec Loss 1.3735 LearningRate 0.0036 Epoch: 16 Global Step: 201070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:10:30,807-Speed 3354.84 samples/sec Loss 1.4119 LearningRate 0.0036 Epoch: 16 Global Step: 201080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:10:33,868-Speed 3346.97 samples/sec Loss 1.3518 LearningRate 0.0036 Epoch: 16 Global Step: 201090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:10:36,981-Speed 3290.04 samples/sec Loss 1.3352 LearningRate 0.0036 Epoch: 16 Global Step: 201100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:10:40,090-Speed 3294.88 samples/sec Loss 1.3764 LearningRate 0.0036 Epoch: 16 Global Step: 201110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:10:43,160-Speed 3336.91 samples/sec Loss 1.3282 LearningRate 0.0036 Epoch: 16 Global Step: 201120 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:10:46,215-Speed 3352.18 samples/sec Loss 1.3246 LearningRate 0.0036 Epoch: 16 Global Step: 201130 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:10:49,330-Speed 3288.48 samples/sec Loss 1.3722 LearningRate 0.0036 Epoch: 16 Global Step: 201140 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:10:52,515-Speed 3216.91 samples/sec Loss 1.3836 LearningRate 0.0036 Epoch: 16 Global Step: 201150 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:10:55,616-Speed 3303.00 samples/sec Loss 1.3356 LearningRate 0.0036 Epoch: 16 Global Step: 201160 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:10:58,689-Speed 3333.11 samples/sec Loss 1.3369 LearningRate 0.0036 Epoch: 16 Global Step: 201170 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:11:01,753-Speed 3343.07 samples/sec Loss 1.3357 LearningRate 0.0036 Epoch: 16 Global Step: 201180 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:11:04,898-Speed 3257.47 samples/sec Loss 1.3814 LearningRate 0.0036 Epoch: 16 Global Step: 201190 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:11:07,957-Speed 3348.19 samples/sec Loss 1.3262 LearningRate 0.0036 Epoch: 16 Global Step: 201200 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:11:11,032-Speed 3332.20 samples/sec Loss 1.3408 LearningRate 0.0036 Epoch: 16 Global Step: 201210 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:11:14,183-Speed 3251.03 samples/sec Loss 1.3839 LearningRate 0.0036 Epoch: 16 Global Step: 201220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:11:17,330-Speed 3254.71 samples/sec Loss 1.3181 LearningRate 0.0036 Epoch: 16 Global Step: 201230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:11:20,475-Speed 3256.84 samples/sec Loss 1.3676 LearningRate 0.0036 Epoch: 16 Global Step: 201240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:11:23,587-Speed 3292.05 samples/sec Loss 1.3347 LearningRate 0.0036 Epoch: 16 Global Step: 201250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:11:26,682-Speed 3310.10 samples/sec Loss 1.3154 LearningRate 0.0036 Epoch: 16 Global Step: 201260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:11:29,792-Speed 3293.00 samples/sec Loss 1.3557 LearningRate 0.0036 Epoch: 16 Global Step: 201270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:11:32,850-Speed 3350.53 samples/sec Loss 1.3793 LearningRate 0.0036 Epoch: 16 Global Step: 201280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:11:35,993-Speed 3258.48 samples/sec Loss 1.3637 LearningRate 0.0036 Epoch: 16 Global Step: 201290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:11:39,184-Speed 3210.55 samples/sec Loss 1.3019 LearningRate 0.0036 Epoch: 16 Global Step: 201300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:11:42,238-Speed 3353.78 samples/sec Loss 1.3570 LearningRate 0.0036 Epoch: 16 Global Step: 201310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:11:45,310-Speed 3334.48 samples/sec Loss 1.3386 LearningRate 0.0036 Epoch: 16 Global Step: 201320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:11:48,373-Speed 3343.73 samples/sec Loss 1.3143 LearningRate 0.0036 Epoch: 16 Global Step: 201330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:11:51,479-Speed 3298.60 samples/sec Loss 1.3656 LearningRate 0.0036 Epoch: 16 Global Step: 201340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:11:54,611-Speed 3270.56 samples/sec Loss 1.3876 LearningRate 0.0036 Epoch: 16 Global Step: 201350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:11:57,695-Speed 3321.85 samples/sec Loss 1.3602 LearningRate 0.0036 Epoch: 16 Global Step: 201360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:12:00,804-Speed 3294.03 samples/sec Loss 1.3418 LearningRate 0.0036 Epoch: 16 Global Step: 201370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:03,944-Speed 3262.72 samples/sec Loss 1.3515 LearningRate 0.0036 Epoch: 16 Global Step: 201380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:07,105-Speed 3239.57 samples/sec Loss 1.3540 LearningRate 0.0036 Epoch: 16 Global Step: 201390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:10,168-Speed 3345.20 samples/sec Loss 1.3499 LearningRate 0.0036 Epoch: 16 Global Step: 201400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:13,282-Speed 3289.44 samples/sec Loss 1.3646 LearningRate 0.0036 Epoch: 16 Global Step: 201410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:16,371-Speed 3315.70 samples/sec Loss 1.3567 LearningRate 0.0036 Epoch: 16 Global Step: 201420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:19,471-Speed 3303.83 samples/sec Loss 1.2903 LearningRate 0.0036 Epoch: 16 Global Step: 201430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:22,522-Speed 3357.98 samples/sec Loss 1.3415 LearningRate 0.0036 Epoch: 16 Global Step: 201440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:25,647-Speed 3277.76 samples/sec Loss 1.3202 LearningRate 0.0036 Epoch: 16 Global Step: 201450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:28,765-Speed 3285.72 samples/sec Loss 1.3682 LearningRate 0.0036 Epoch: 16 Global Step: 201460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:31,843-Speed 3327.54 samples/sec Loss 1.3323 LearningRate 0.0036 Epoch: 16 Global Step: 201470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:12:34,900-Speed 3350.16 samples/sec Loss 1.3093 LearningRate 0.0036 Epoch: 16 Global Step: 201480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:38,030-Speed 3273.49 samples/sec Loss 1.3512 LearningRate 0.0036 Epoch: 16 Global Step: 201490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:41,167-Speed 3265.32 samples/sec Loss 1.3193 LearningRate 0.0036 Epoch: 16 Global Step: 201500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:44,296-Speed 3273.81 samples/sec Loss 1.3678 LearningRate 0.0036 Epoch: 16 Global Step: 201510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:47,386-Speed 3314.57 samples/sec Loss 1.3475 LearningRate 0.0036 Epoch: 16 Global Step: 201520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:50,496-Speed 3294.17 samples/sec Loss 1.3718 LearningRate 0.0036 Epoch: 16 Global Step: 201530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:53,573-Speed 3328.92 samples/sec Loss 1.3396 LearningRate 0.0036 Epoch: 16 Global Step: 201540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:56,647-Speed 3332.11 samples/sec Loss 1.2886 LearningRate 0.0036 Epoch: 16 Global Step: 201550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:12:59,789-Speed 3262.35 samples/sec Loss 1.3711 LearningRate 0.0036 Epoch: 16 Global Step: 201560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:13:02,928-Speed 3263.33 samples/sec Loss 1.3430 LearningRate 0.0036 Epoch: 16 Global Step: 201570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:13:06,014-Speed 3318.65 samples/sec Loss 1.2763 LearningRate 0.0036 Epoch: 16 Global Step: 201580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:13:09,085-Speed 3335.62 samples/sec Loss 1.3748 LearningRate 0.0036 Epoch: 16 Global Step: 201590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:13:12,223-Speed 3264.51 samples/sec Loss 1.3565 LearningRate 0.0036 Epoch: 16 Global Step: 201600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:13:15,383-Speed 3240.82 samples/sec Loss 1.3546 LearningRate 0.0036 Epoch: 16 Global Step: 201610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:13:18,466-Speed 3322.56 samples/sec Loss 1.3551 LearningRate 0.0035 Epoch: 16 Global Step: 201620 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:13:21,532-Speed 3341.03 samples/sec Loss 1.3137 LearningRate 0.0035 Epoch: 16 Global Step: 201630 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:13:24,625-Speed 3311.89 samples/sec Loss 1.3492 LearningRate 0.0035 Epoch: 16 Global Step: 201640 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:13:27,724-Speed 3305.72 samples/sec Loss 1.3452 LearningRate 0.0035 Epoch: 16 Global Step: 201650 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:13:30,813-Speed 3316.10 samples/sec Loss 1.3375 LearningRate 0.0035 Epoch: 16 Global Step: 201660 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:13:33,882-Speed 3337.62 samples/sec Loss 1.3740 LearningRate 0.0035 Epoch: 16 Global Step: 201670 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:13:37,021-Speed 3263.24 samples/sec Loss 1.3238 LearningRate 0.0035 Epoch: 16 Global Step: 201680 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:13:40,142-Speed 3282.56 samples/sec Loss 1.3241 LearningRate 0.0035 Epoch: 16 Global Step: 201690 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:13:43,243-Speed 3302.35 samples/sec Loss 1.3527 LearningRate 0.0035 Epoch: 16 Global Step: 201700 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:13:46,325-Speed 3323.58 samples/sec Loss 1.3416 LearningRate 0.0035 Epoch: 16 Global Step: 201710 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:13:49,456-Speed 3272.52 samples/sec Loss 1.3722 LearningRate 0.0035 Epoch: 16 Global Step: 201720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:13:52,609-Speed 3247.96 samples/sec Loss 1.3338 LearningRate 0.0035 Epoch: 16 Global Step: 201730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:13:55,720-Speed 3293.29 samples/sec Loss 1.2912 LearningRate 0.0035 Epoch: 16 Global Step: 201740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:13:58,820-Speed 3303.43 samples/sec Loss 1.3459 LearningRate 0.0035 Epoch: 16 Global Step: 201750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:01,876-Speed 3351.81 samples/sec Loss 1.3222 LearningRate 0.0035 Epoch: 16 Global Step: 201760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:05,004-Speed 3275.44 samples/sec Loss 1.3485 LearningRate 0.0035 Epoch: 16 Global Step: 201770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:08,135-Speed 3271.58 samples/sec Loss 1.3527 LearningRate 0.0035 Epoch: 16 Global Step: 201780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:11,232-Speed 3307.16 samples/sec Loss 1.3116 LearningRate 0.0035 Epoch: 16 Global Step: 201790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:14,329-Speed 3307.07 samples/sec Loss 1.3930 LearningRate 0.0035 Epoch: 16 Global Step: 201800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:17,440-Speed 3293.20 samples/sec Loss 1.2870 LearningRate 0.0035 Epoch: 16 Global Step: 201810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:20,485-Speed 3363.82 samples/sec Loss 1.3159 LearningRate 0.0035 Epoch: 16 Global Step: 201820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:14:23,622-Speed 3265.19 samples/sec Loss 1.3966 LearningRate 0.0035 Epoch: 16 Global Step: 201830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:26,763-Speed 3261.29 samples/sec Loss 1.3851 LearningRate 0.0035 Epoch: 16 Global Step: 201840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:29,908-Speed 3256.95 samples/sec Loss 1.3158 LearningRate 0.0035 Epoch: 16 Global Step: 201850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:32,982-Speed 3332.12 samples/sec Loss 1.3583 LearningRate 0.0035 Epoch: 16 Global Step: 201860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:36,061-Speed 3326.39 samples/sec Loss 1.3252 LearningRate 0.0035 Epoch: 16 Global Step: 201870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:39,115-Speed 3354.70 samples/sec Loss 1.3485 LearningRate 0.0035 Epoch: 16 Global Step: 201880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:42,184-Speed 3337.61 samples/sec Loss 1.3165 LearningRate 0.0035 Epoch: 16 Global Step: 201890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:45,238-Speed 3354.55 samples/sec Loss 1.3291 LearningRate 0.0035 Epoch: 16 Global Step: 201900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:48,324-Speed 3318.16 samples/sec Loss 1.3394 LearningRate 0.0035 Epoch: 16 Global Step: 201910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:51,393-Speed 3340.61 samples/sec Loss 1.3784 LearningRate 0.0035 Epoch: 16 Global Step: 201920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:54,448-Speed 3353.36 samples/sec Loss 1.3469 LearningRate 0.0035 Epoch: 16 Global Step: 201930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:14:57,505-Speed 3351.21 samples/sec Loss 1.3503 LearningRate 0.0035 Epoch: 16 Global Step: 201940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:15:00,561-Speed 3351.65 samples/sec Loss 1.3423 LearningRate 0.0035 Epoch: 16 Global Step: 201950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:15:03,697-Speed 3265.46 samples/sec Loss 1.3785 LearningRate 0.0035 Epoch: 16 Global Step: 201960 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:15:06,810-Speed 3291.30 samples/sec Loss 1.3456 LearningRate 0.0035 Epoch: 16 Global Step: 201970 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:15:09,937-Speed 3275.80 samples/sec Loss 1.3554 LearningRate 0.0035 Epoch: 16 Global Step: 201980 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:15:13,180-Speed 3158.16 samples/sec Loss 1.3260 LearningRate 0.0035 Epoch: 16 Global Step: 201990 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:15:16,339-Speed 3242.30 samples/sec Loss 1.3295 LearningRate 0.0035 Epoch: 16 Global Step: 202000 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:15:19,469-Speed 3272.52 samples/sec Loss 1.3908 LearningRate 0.0035 Epoch: 16 Global Step: 202010 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:15:22,548-Speed 3327.04 samples/sec Loss 1.3829 LearningRate 0.0035 Epoch: 16 Global Step: 202020 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:15:25,715-Speed 3234.58 samples/sec Loss 1.3184 LearningRate 0.0035 Epoch: 16 Global Step: 202030 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:15:28,882-Speed 3234.03 samples/sec Loss 1.3559 LearningRate 0.0035 Epoch: 16 Global Step: 202040 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:15:31,956-Speed 3332.32 samples/sec Loss 1.3324 LearningRate 0.0035 Epoch: 16 Global Step: 202050 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:15:35,085-Speed 3273.59 samples/sec Loss 1.3718 LearningRate 0.0035 Epoch: 16 Global Step: 202060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:15:38,280-Speed 3206.40 samples/sec Loss 1.4057 LearningRate 0.0035 Epoch: 16 Global Step: 202070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:15:41,367-Speed 3318.21 samples/sec Loss 1.3898 LearningRate 0.0035 Epoch: 16 Global Step: 202080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:15:44,466-Speed 3305.40 samples/sec Loss 1.3625 LearningRate 0.0035 Epoch: 16 Global Step: 202090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:15:47,595-Speed 3273.55 samples/sec Loss 1.3632 LearningRate 0.0035 Epoch: 16 Global Step: 202100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:15:50,649-Speed 3353.59 samples/sec Loss 1.3963 LearningRate 0.0035 Epoch: 16 Global Step: 202110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:15:53,709-Speed 3348.34 samples/sec Loss 1.3678 LearningRate 0.0035 Epoch: 16 Global Step: 202120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:15:56,755-Speed 3362.58 samples/sec Loss 1.3956 LearningRate 0.0035 Epoch: 16 Global Step: 202130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:15:59,875-Speed 3282.55 samples/sec Loss 1.3438 LearningRate 0.0035 Epoch: 16 Global Step: 202140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:02,938-Speed 3344.52 samples/sec Loss 1.3248 LearningRate 0.0035 Epoch: 16 Global Step: 202150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:05,996-Speed 3350.10 samples/sec Loss 1.3308 LearningRate 0.0035 Epoch: 16 Global Step: 202160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:16:09,046-Speed 3358.41 samples/sec Loss 1.3630 LearningRate 0.0035 Epoch: 16 Global Step: 202170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:16:12,136-Speed 3314.39 samples/sec Loss 1.3556 LearningRate 0.0035 Epoch: 16 Global Step: 202180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:15,222-Speed 3319.69 samples/sec Loss 1.3543 LearningRate 0.0035 Epoch: 16 Global Step: 202190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:18,285-Speed 3344.24 samples/sec Loss 1.4146 LearningRate 0.0035 Epoch: 16 Global Step: 202200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:21,342-Speed 3350.30 samples/sec Loss 1.3685 LearningRate 0.0035 Epoch: 16 Global Step: 202210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:24,407-Speed 3341.91 samples/sec Loss 1.3513 LearningRate 0.0035 Epoch: 16 Global Step: 202220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:27,560-Speed 3249.46 samples/sec Loss 1.3119 LearningRate 0.0035 Epoch: 16 Global Step: 202230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:30,690-Speed 3272.58 samples/sec Loss 1.3608 LearningRate 0.0035 Epoch: 16 Global Step: 202240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:33,780-Speed 3314.14 samples/sec Loss 1.3520 LearningRate 0.0035 Epoch: 16 Global Step: 202250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:36,873-Speed 3312.65 samples/sec Loss 1.3684 LearningRate 0.0035 Epoch: 16 Global Step: 202260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:39,957-Speed 3320.29 samples/sec Loss 1.3322 LearningRate 0.0035 Epoch: 16 Global Step: 202270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:43,043-Speed 3319.61 samples/sec Loss 1.3748 LearningRate 0.0034 Epoch: 16 Global Step: 202280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:16:46,115-Speed 3334.84 samples/sec Loss 1.3589 LearningRate 0.0034 Epoch: 16 Global Step: 202290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:16:49,189-Speed 3332.65 samples/sec Loss 1.3503 LearningRate 0.0034 Epoch: 16 Global Step: 202300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:52,253-Speed 3342.54 samples/sec Loss 1.3397 LearningRate 0.0034 Epoch: 16 Global Step: 202310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:55,363-Speed 3293.35 samples/sec Loss 1.3759 LearningRate 0.0034 Epoch: 16 Global Step: 202320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:16:58,418-Speed 3353.44 samples/sec Loss 1.3162 LearningRate 0.0034 Epoch: 16 Global Step: 202330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:01,533-Speed 3288.03 samples/sec Loss 1.3640 LearningRate 0.0034 Epoch: 16 Global Step: 202340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:04,651-Speed 3285.01 samples/sec Loss 1.3626 LearningRate 0.0034 Epoch: 16 Global Step: 202350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:07,801-Speed 3251.81 samples/sec Loss 1.3347 LearningRate 0.0034 Epoch: 16 Global Step: 202360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:10,892-Speed 3314.36 samples/sec Loss 1.3434 LearningRate 0.0034 Epoch: 16 Global Step: 202370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:14,065-Speed 3227.86 samples/sec Loss 1.3233 LearningRate 0.0034 Epoch: 16 Global Step: 202380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:17,190-Speed 3277.66 samples/sec Loss 1.3734 LearningRate 0.0034 Epoch: 16 Global Step: 202390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:20,261-Speed 3335.61 samples/sec Loss 1.3083 LearningRate 0.0034 Epoch: 16 Global Step: 202400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:17:23,313-Speed 3356.04 samples/sec Loss 1.4089 LearningRate 0.0034 Epoch: 16 Global Step: 202410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:26,421-Speed 3296.04 samples/sec Loss 1.3911 LearningRate 0.0034 Epoch: 16 Global Step: 202420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:29,563-Speed 3260.02 samples/sec Loss 1.3581 LearningRate 0.0034 Epoch: 16 Global Step: 202430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:32,629-Speed 3341.86 samples/sec Loss 1.3222 LearningRate 0.0034 Epoch: 16 Global Step: 202440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:35,684-Speed 3352.71 samples/sec Loss 1.3613 LearningRate 0.0034 Epoch: 16 Global Step: 202450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:38,793-Speed 3294.37 samples/sec Loss 1.3337 LearningRate 0.0034 Epoch: 16 Global Step: 202460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:17:41,843-Speed 3358.68 samples/sec Loss 1.3770 LearningRate 0.0034 Epoch: 16 Global Step: 202470 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:17:44,970-Speed 3275.88 samples/sec Loss 1.3297 LearningRate 0.0034 Epoch: 16 Global Step: 202480 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:17:48,032-Speed 3345.71 samples/sec Loss 1.3851 LearningRate 0.0034 Epoch: 16 Global Step: 202490 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:17:51,155-Speed 3279.47 samples/sec Loss 1.3711 LearningRate 0.0034 Epoch: 16 Global Step: 202500 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:17:54,231-Speed 3329.94 samples/sec Loss 1.3721 LearningRate 0.0034 Epoch: 16 Global Step: 202510 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:17:57,290-Speed 3348.55 samples/sec Loss 1.3636 LearningRate 0.0034 Epoch: 16 Global Step: 202520 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:18:00,343-Speed 3355.55 samples/sec Loss 1.3157 LearningRate 0.0034 Epoch: 16 Global Step: 202530 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:18:03,432-Speed 3315.63 samples/sec Loss 1.3643 LearningRate 0.0034 Epoch: 16 Global Step: 202540 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:18:06,507-Speed 3331.48 samples/sec Loss 1.3558 LearningRate 0.0034 Epoch: 16 Global Step: 202550 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:18:09,556-Speed 3359.61 samples/sec Loss 1.3557 LearningRate 0.0034 Epoch: 16 Global Step: 202560 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:18:12,679-Speed 3280.14 samples/sec Loss 1.3384 LearningRate 0.0034 Epoch: 16 Global Step: 202570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:15,726-Speed 3361.20 samples/sec Loss 1.3438 LearningRate 0.0034 Epoch: 16 Global Step: 202580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:18,778-Speed 3355.89 samples/sec Loss 1.4079 LearningRate 0.0034 Epoch: 16 Global Step: 202590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:21,845-Speed 3339.97 samples/sec Loss 1.3308 LearningRate 0.0034 Epoch: 16 Global Step: 202600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:24,923-Speed 3328.00 samples/sec Loss 1.3809 LearningRate 0.0034 Epoch: 16 Global Step: 202610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:28,006-Speed 3323.23 samples/sec Loss 1.3558 LearningRate 0.0034 Epoch: 16 Global Step: 202620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:31,125-Speed 3283.71 samples/sec Loss 1.3308 LearningRate 0.0034 Epoch: 16 Global Step: 202630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:34,217-Speed 3313.00 samples/sec Loss 1.3894 LearningRate 0.0034 Epoch: 16 Global Step: 202640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:37,307-Speed 3314.65 samples/sec Loss 1.3404 LearningRate 0.0034 Epoch: 16 Global Step: 202650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:40,407-Speed 3303.87 samples/sec Loss 1.3530 LearningRate 0.0034 Epoch: 16 Global Step: 202660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:43,524-Speed 3286.54 samples/sec Loss 1.3907 LearningRate 0.0034 Epoch: 16 Global Step: 202670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:18:46,598-Speed 3332.70 samples/sec Loss 1.3047 LearningRate 0.0034 Epoch: 16 Global Step: 202680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:18:49,704-Speed 3298.07 samples/sec Loss 1.3437 LearningRate 0.0034 Epoch: 16 Global Step: 202690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:18:52,816-Speed 3290.90 samples/sec Loss 1.3913 LearningRate 0.0034 Epoch: 16 Global Step: 202700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:55,911-Speed 3310.26 samples/sec Loss 1.3214 LearningRate 0.0034 Epoch: 16 Global Step: 202710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:18:59,034-Speed 3279.95 samples/sec Loss 1.3233 LearningRate 0.0034 Epoch: 16 Global Step: 202720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:02,153-Speed 3283.61 samples/sec Loss 1.3752 LearningRate 0.0034 Epoch: 16 Global Step: 202730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:05,265-Speed 3291.31 samples/sec Loss 1.3661 LearningRate 0.0034 Epoch: 16 Global Step: 202740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:08,322-Speed 3351.32 samples/sec Loss 1.3878 LearningRate 0.0034 Epoch: 16 Global Step: 202750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:11,416-Speed 3310.47 samples/sec Loss 1.3857 LearningRate 0.0034 Epoch: 16 Global Step: 202760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:14,521-Speed 3299.82 samples/sec Loss 1.3205 LearningRate 0.0034 Epoch: 16 Global Step: 202770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:17,625-Speed 3300.14 samples/sec Loss 1.3652 LearningRate 0.0034 Epoch: 16 Global Step: 202780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:20,689-Speed 3342.77 samples/sec Loss 1.3870 LearningRate 0.0034 Epoch: 16 Global Step: 202790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:23,809-Speed 3283.05 samples/sec Loss 1.3421 LearningRate 0.0034 Epoch: 16 Global Step: 202800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:19:26,905-Speed 3308.54 samples/sec Loss 1.3422 LearningRate 0.0034 Epoch: 16 Global Step: 202810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:19:30,045-Speed 3261.58 samples/sec Loss 1.3697 LearningRate 0.0034 Epoch: 16 Global Step: 202820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:19:33,098-Speed 3355.80 samples/sec Loss 1.3244 LearningRate 0.0034 Epoch: 16 Global Step: 202830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:19:36,157-Speed 3348.44 samples/sec Loss 1.3394 LearningRate 0.0034 Epoch: 16 Global Step: 202840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:19:39,220-Speed 3343.92 samples/sec Loss 1.3534 LearningRate 0.0034 Epoch: 16 Global Step: 202850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:42,390-Speed 3231.63 samples/sec Loss 1.3501 LearningRate 0.0034 Epoch: 16 Global Step: 202860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:45,441-Speed 3357.34 samples/sec Loss 1.3524 LearningRate 0.0034 Epoch: 16 Global Step: 202870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:48,543-Speed 3302.25 samples/sec Loss 1.3525 LearningRate 0.0034 Epoch: 16 Global Step: 202880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:51,630-Speed 3318.09 samples/sec Loss 1.3741 LearningRate 0.0034 Epoch: 16 Global Step: 202890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:54,767-Speed 3266.05 samples/sec Loss 1.3459 LearningRate 0.0034 Epoch: 16 Global Step: 202900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:19:57,847-Speed 3325.60 samples/sec Loss 1.3201 LearningRate 0.0034 Epoch: 16 Global Step: 202910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:20:00,959-Speed 3291.87 samples/sec Loss 1.3798 LearningRate 0.0034 Epoch: 16 Global Step: 202920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:20:04,069-Speed 3292.89 samples/sec Loss 1.3584 LearningRate 0.0034 Epoch: 16 Global Step: 202930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:20:07,191-Speed 3281.69 samples/sec Loss 1.3647 LearningRate 0.0034 Epoch: 16 Global Step: 202940 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:20:10,292-Speed 3302.59 samples/sec Loss 1.3716 LearningRate 0.0034 Epoch: 16 Global Step: 202950 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:20:13,425-Speed 3270.06 samples/sec Loss 1.3036 LearningRate 0.0033 Epoch: 16 Global Step: 202960 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:20:16,629-Speed 3197.19 samples/sec Loss 1.3227 LearningRate 0.0033 Epoch: 16 Global Step: 202970 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:20:19,714-Speed 3319.51 samples/sec Loss 1.3397 LearningRate 0.0033 Epoch: 16 Global Step: 202980 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:20:22,810-Speed 3308.46 samples/sec Loss 1.3049 LearningRate 0.0033 Epoch: 16 Global Step: 202990 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:20:25,907-Speed 3308.49 samples/sec Loss 1.3865 LearningRate 0.0033 Epoch: 16 Global Step: 203000 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:20:29,099-Speed 3208.20 samples/sec Loss 1.4458 LearningRate 0.0033 Epoch: 16 Global Step: 203010 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:20:32,241-Speed 3260.73 samples/sec Loss 1.4146 LearningRate 0.0033 Epoch: 16 Global Step: 203020 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:20:35,337-Speed 3308.54 samples/sec Loss 1.3939 LearningRate 0.0033 Epoch: 16 Global Step: 203030 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:20:38,397-Speed 3347.49 samples/sec Loss 1.3581 LearningRate 0.0033 Epoch: 16 Global Step: 203040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:20:41,517-Speed 3282.32 samples/sec Loss 1.4055 LearningRate 0.0033 Epoch: 16 Global Step: 203050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:20:44,608-Speed 3314.76 samples/sec Loss 1.4011 LearningRate 0.0033 Epoch: 16 Global Step: 203060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:20:47,708-Speed 3303.51 samples/sec Loss 1.3445 LearningRate 0.0033 Epoch: 16 Global Step: 203070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:20:50,840-Speed 3271.14 samples/sec Loss 1.3657 LearningRate 0.0033 Epoch: 16 Global Step: 203080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:20:53,913-Speed 3334.92 samples/sec Loss 1.4430 LearningRate 0.0033 Epoch: 16 Global Step: 203090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:20:56,960-Speed 3361.49 samples/sec Loss 1.4319 LearningRate 0.0033 Epoch: 16 Global Step: 203100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:00,120-Speed 3240.80 samples/sec Loss 1.3846 LearningRate 0.0033 Epoch: 16 Global Step: 203110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:03,220-Speed 3304.76 samples/sec Loss 1.3372 LearningRate 0.0033 Epoch: 16 Global Step: 203120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:06,348-Speed 3274.39 samples/sec Loss 1.3429 LearningRate 0.0033 Epoch: 16 Global Step: 203130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:09,408-Speed 3347.34 samples/sec Loss 1.3248 LearningRate 0.0033 Epoch: 16 Global Step: 203140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:21:12,506-Speed 3307.48 samples/sec Loss 1.3452 LearningRate 0.0033 Epoch: 16 Global Step: 203150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:21:15,574-Speed 3337.98 samples/sec Loss 1.3786 LearningRate 0.0033 Epoch: 16 Global Step: 203160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:18,710-Speed 3267.16 samples/sec Loss 1.3348 LearningRate 0.0033 Epoch: 16 Global Step: 203170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:21,764-Speed 3352.97 samples/sec Loss 1.3416 LearningRate 0.0033 Epoch: 16 Global Step: 203180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:24,876-Speed 3291.71 samples/sec Loss 1.3586 LearningRate 0.0033 Epoch: 16 Global Step: 203190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:27,988-Speed 3292.29 samples/sec Loss 1.4014 LearningRate 0.0033 Epoch: 16 Global Step: 203200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:31,121-Speed 3269.23 samples/sec Loss 1.3633 LearningRate 0.0033 Epoch: 16 Global Step: 203210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:34,201-Speed 3325.24 samples/sec Loss 1.3879 LearningRate 0.0033 Epoch: 16 Global Step: 203220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:37,285-Speed 3322.18 samples/sec Loss 1.3828 LearningRate 0.0033 Epoch: 16 Global Step: 203230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:40,382-Speed 3306.97 samples/sec Loss 1.3275 LearningRate 0.0033 Epoch: 16 Global Step: 203240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:43,473-Speed 3313.72 samples/sec Loss 1.3962 LearningRate 0.0033 Epoch: 16 Global Step: 203250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:46,588-Speed 3289.07 samples/sec Loss 1.3695 LearningRate 0.0033 Epoch: 16 Global Step: 203260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:21:49,691-Speed 3300.70 samples/sec Loss 1.4090 LearningRate 0.0033 Epoch: 16 Global Step: 203270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:52,864-Speed 3228.52 samples/sec Loss 1.3511 LearningRate 0.0033 Epoch: 16 Global Step: 203280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:55,985-Speed 3282.27 samples/sec Loss 1.3785 LearningRate 0.0033 Epoch: 16 Global Step: 203290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:21:59,077-Speed 3311.64 samples/sec Loss 1.3190 LearningRate 0.0033 Epoch: 16 Global Step: 203300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:02,216-Speed 3263.42 samples/sec Loss 1.4191 LearningRate 0.0033 Epoch: 16 Global Step: 203310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:05,404-Speed 3212.91 samples/sec Loss 1.3854 LearningRate 0.0033 Epoch: 16 Global Step: 203320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:08,492-Speed 3317.41 samples/sec Loss 1.3145 LearningRate 0.0033 Epoch: 16 Global Step: 203330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:11,647-Speed 3246.27 samples/sec Loss 1.3808 LearningRate 0.0033 Epoch: 16 Global Step: 203340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:14,859-Speed 3189.77 samples/sec Loss 1.3803 LearningRate 0.0033 Epoch: 16 Global Step: 203350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:17,967-Speed 3295.68 samples/sec Loss 1.4206 LearningRate 0.0033 Epoch: 16 Global Step: 203360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:21,059-Speed 3312.34 samples/sec Loss 1.3648 LearningRate 0.0033 Epoch: 16 Global Step: 203370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:22:24,143-Speed 3322.15 samples/sec Loss 1.3652 LearningRate 0.0033 Epoch: 16 Global Step: 203380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:27,266-Speed 3279.85 samples/sec Loss 1.3177 LearningRate 0.0033 Epoch: 16 Global Step: 203390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:30,424-Speed 3243.07 samples/sec Loss 1.3921 LearningRate 0.0033 Epoch: 16 Global Step: 203400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:33,542-Speed 3285.42 samples/sec Loss 1.3834 LearningRate 0.0033 Epoch: 16 Global Step: 203410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:36,688-Speed 3256.15 samples/sec Loss 1.4269 LearningRate 0.0033 Epoch: 16 Global Step: 203420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:39,898-Speed 3190.33 samples/sec Loss 1.3798 LearningRate 0.0033 Epoch: 16 Global Step: 203430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:43,071-Speed 3228.77 samples/sec Loss 1.4039 LearningRate 0.0033 Epoch: 16 Global Step: 203440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:46,233-Speed 3238.94 samples/sec Loss 1.3564 LearningRate 0.0033 Epoch: 16 Global Step: 203450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:22:49,290-Speed 3351.21 samples/sec Loss 1.3544 LearningRate 0.0033 Epoch: 16 Global Step: 203460 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:22:52,362-Speed 3333.72 samples/sec Loss 1.3726 LearningRate 0.0033 Epoch: 16 Global Step: 203470 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:22:55,487-Speed 3278.12 samples/sec Loss 1.4188 LearningRate 0.0033 Epoch: 16 Global Step: 203480 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:22:58,540-Speed 3355.25 samples/sec Loss 1.3470 LearningRate 0.0033 Epoch: 16 Global Step: 203490 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:23:01,611-Speed 3335.33 samples/sec Loss 1.3985 LearningRate 0.0033 Epoch: 16 Global Step: 203500 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:23:04,744-Speed 3270.00 samples/sec Loss 1.3912 LearningRate 0.0033 Epoch: 16 Global Step: 203510 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:23:07,850-Speed 3297.27 samples/sec Loss 1.3822 LearningRate 0.0033 Epoch: 16 Global Step: 203520 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:23:10,924-Speed 3332.96 samples/sec Loss 1.3257 LearningRate 0.0033 Epoch: 16 Global Step: 203530 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:23:14,055-Speed 3271.07 samples/sec Loss 1.3611 LearningRate 0.0033 Epoch: 16 Global Step: 203540 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:23:17,193-Speed 3264.39 samples/sec Loss 1.3030 LearningRate 0.0033 Epoch: 16 Global Step: 203550 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:23:20,257-Speed 3343.53 samples/sec Loss 1.3549 LearningRate 0.0033 Epoch: 16 Global Step: 203560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:23:23,409-Speed 3249.61 samples/sec Loss 1.3741 LearningRate 0.0033 Epoch: 16 Global Step: 203570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:23:26,598-Speed 3212.08 samples/sec Loss 1.3764 LearningRate 0.0033 Epoch: 16 Global Step: 203580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:23:29,766-Speed 3233.31 samples/sec Loss 1.3721 LearningRate 0.0033 Epoch: 16 Global Step: 203590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:23:32,842-Speed 3329.62 samples/sec Loss 1.3913 LearningRate 0.0033 Epoch: 16 Global Step: 203600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:23:35,979-Speed 3266.08 samples/sec Loss 1.3265 LearningRate 0.0033 Epoch: 16 Global Step: 203610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:23:39,083-Speed 3298.99 samples/sec Loss 1.4528 LearningRate 0.0033 Epoch: 16 Global Step: 203620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:23:42,213-Speed 3273.67 samples/sec Loss 1.3361 LearningRate 0.0033 Epoch: 16 Global Step: 203630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:23:45,293-Speed 3326.01 samples/sec Loss 1.3334 LearningRate 0.0032 Epoch: 16 Global Step: 203640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:23:48,439-Speed 3255.20 samples/sec Loss 1.3764 LearningRate 0.0032 Epoch: 16 Global Step: 203650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:23:51,558-Speed 3283.94 samples/sec Loss 1.4145 LearningRate 0.0032 Epoch: 16 Global Step: 203660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:23:54,700-Speed 3260.21 samples/sec Loss 1.4251 LearningRate 0.0032 Epoch: 16 Global Step: 203670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:23:57,806-Speed 3298.22 samples/sec Loss 1.3296 LearningRate 0.0032 Epoch: 16 Global Step: 203680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:24:00,918-Speed 3291.30 samples/sec Loss 1.3624 LearningRate 0.0032 Epoch: 16 Global Step: 203690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:24:04,060-Speed 3260.12 samples/sec Loss 1.3654 LearningRate 0.0032 Epoch: 16 Global Step: 203700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:24:07,237-Speed 3223.55 samples/sec Loss 1.3338 LearningRate 0.0032 Epoch: 16 Global Step: 203710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:24:10,330-Speed 3312.26 samples/sec Loss 1.3730 LearningRate 0.0032 Epoch: 16 Global Step: 203720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:24:13,437-Speed 3296.70 samples/sec Loss 1.3594 LearningRate 0.0032 Epoch: 16 Global Step: 203730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:24:16,634-Speed 3204.40 samples/sec Loss 1.3518 LearningRate 0.0032 Epoch: 16 Global Step: 203740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:24:19,830-Speed 3204.84 samples/sec Loss 1.3852 LearningRate 0.0032 Epoch: 16 Global Step: 203750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:24:22,910-Speed 3324.83 samples/sec Loss 1.3624 LearningRate 0.0032 Epoch: 16 Global Step: 203760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:24:26,074-Speed 3237.44 samples/sec Loss 1.3460 LearningRate 0.0032 Epoch: 16 Global Step: 203770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:24:29,251-Speed 3224.65 samples/sec Loss 1.3446 LearningRate 0.0032 Epoch: 16 Global Step: 203780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:24:32,325-Speed 3332.46 samples/sec Loss 1.3381 LearningRate 0.0032 Epoch: 16 Global Step: 203790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:24:35,509-Speed 3218.27 samples/sec Loss 1.3826 LearningRate 0.0032 Epoch: 16 Global Step: 203800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:24:38,635-Speed 3276.91 samples/sec Loss 1.3590 LearningRate 0.0032 Epoch: 16 Global Step: 203810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:24:41,716-Speed 3324.10 samples/sec Loss 1.3445 LearningRate 0.0032 Epoch: 16 Global Step: 203820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:24:44,834-Speed 3285.57 samples/sec Loss 1.3933 LearningRate 0.0032 Epoch: 16 Global Step: 203830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:24:47,952-Speed 3285.52 samples/sec Loss 1.4171 LearningRate 0.0032 Epoch: 16 Global Step: 203840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:24:51,025-Speed 3333.36 samples/sec Loss 1.3509 LearningRate 0.0032 Epoch: 16 Global Step: 203850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:24:54,132-Speed 3296.07 samples/sec Loss 1.3945 LearningRate 0.0032 Epoch: 16 Global Step: 203860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:24:57,205-Speed 3333.13 samples/sec Loss 1.3831 LearningRate 0.0032 Epoch: 16 Global Step: 203870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 19:25:00,281-Speed 3330.11 samples/sec Loss 1.3912 LearningRate 0.0032 Epoch: 16 Global Step: 203880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:03,386-Speed 3299.58 samples/sec Loss 1.3678 LearningRate 0.0032 Epoch: 16 Global Step: 203890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:06,541-Speed 3246.07 samples/sec Loss 1.3263 LearningRate 0.0032 Epoch: 16 Global Step: 203900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:09,647-Speed 3298.45 samples/sec Loss 1.3271 LearningRate 0.0032 Epoch: 16 Global Step: 203910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:12,743-Speed 3308.01 samples/sec Loss 1.3939 LearningRate 0.0032 Epoch: 16 Global Step: 203920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:15,953-Speed 3191.22 samples/sec Loss 1.4053 LearningRate 0.0032 Epoch: 16 Global Step: 203930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:19,036-Speed 3322.77 samples/sec Loss 1.3244 LearningRate 0.0032 Epoch: 16 Global Step: 203940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:22,096-Speed 3347.93 samples/sec Loss 1.4159 LearningRate 0.0032 Epoch: 16 Global Step: 203950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:25,229-Speed 3269.75 samples/sec Loss 1.3294 LearningRate 0.0032 Epoch: 16 Global Step: 203960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:28,339-Speed 3293.50 samples/sec Loss 1.3835 LearningRate 0.0032 Epoch: 16 Global Step: 203970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:31,407-Speed 3338.59 samples/sec Loss 1.3512 LearningRate 0.0032 Epoch: 16 Global Step: 203980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:25:34,482-Speed 3330.29 samples/sec Loss 1.3646 LearningRate 0.0032 Epoch: 16 Global Step: 203990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:25:37,624-Speed 3260.26 samples/sec Loss 1.3643 LearningRate 0.0032 Epoch: 16 Global Step: 204000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:25:40,731-Speed 3296.83 samples/sec Loss 1.3634 LearningRate 0.0032 Epoch: 16 Global Step: 204010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:25:43,867-Speed 3266.71 samples/sec Loss 1.3755 LearningRate 0.0032 Epoch: 16 Global Step: 204020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:25:46,941-Speed 3332.80 samples/sec Loss 1.4204 LearningRate 0.0032 Epoch: 16 Global Step: 204030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:25:50,013-Speed 3334.26 samples/sec Loss 1.3414 LearningRate 0.0032 Epoch: 16 Global Step: 204040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:25:53,198-Speed 3215.62 samples/sec Loss 1.3773 LearningRate 0.0032 Epoch: 16 Global Step: 204050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:56,310-Speed 3291.42 samples/sec Loss 1.3165 LearningRate 0.0032 Epoch: 16 Global Step: 204060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:25:59,383-Speed 3333.34 samples/sec Loss 1.3740 LearningRate 0.0032 Epoch: 16 Global Step: 204070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:02,462-Speed 3326.42 samples/sec Loss 1.3544 LearningRate 0.0032 Epoch: 16 Global Step: 204080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:05,632-Speed 3231.91 samples/sec Loss 1.3680 LearningRate 0.0032 Epoch: 16 Global Step: 204090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:08,719-Speed 3318.58 samples/sec Loss 1.3242 LearningRate 0.0032 Epoch: 16 Global Step: 204100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:11,837-Speed 3284.43 samples/sec Loss 1.3615 LearningRate 0.0032 Epoch: 16 Global Step: 204110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:15,027-Speed 3210.54 samples/sec Loss 1.3676 LearningRate 0.0032 Epoch: 16 Global Step: 204120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:18,270-Speed 3158.52 samples/sec Loss 1.3554 LearningRate 0.0032 Epoch: 16 Global Step: 204130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:21,317-Speed 3362.13 samples/sec Loss 1.3728 LearningRate 0.0032 Epoch: 16 Global Step: 204140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:24,407-Speed 3314.75 samples/sec Loss 1.3252 LearningRate 0.0032 Epoch: 16 Global Step: 204150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:26:27,535-Speed 3275.52 samples/sec Loss 1.3942 LearningRate 0.0032 Epoch: 16 Global Step: 204160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:30,653-Speed 3284.99 samples/sec Loss 1.3847 LearningRate 0.0032 Epoch: 16 Global Step: 204170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:33,717-Speed 3342.85 samples/sec Loss 1.3955 LearningRate 0.0032 Epoch: 16 Global Step: 204180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:36,831-Speed 3289.65 samples/sec Loss 1.3736 LearningRate 0.0032 Epoch: 16 Global Step: 204190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:39,930-Speed 3304.53 samples/sec Loss 1.3745 LearningRate 0.0032 Epoch: 16 Global Step: 204200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:43,046-Speed 3288.14 samples/sec Loss 1.3782 LearningRate 0.0032 Epoch: 16 Global Step: 204210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:46,184-Speed 3264.56 samples/sec Loss 1.3825 LearningRate 0.0032 Epoch: 16 Global Step: 204220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:49,356-Speed 3228.74 samples/sec Loss 1.3233 LearningRate 0.0032 Epoch: 16 Global Step: 204230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:52,508-Speed 3249.58 samples/sec Loss 1.3347 LearningRate 0.0032 Epoch: 16 Global Step: 204240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:55,612-Speed 3299.83 samples/sec Loss 1.3922 LearningRate 0.0032 Epoch: 16 Global Step: 204250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:26:58,696-Speed 3322.42 samples/sec Loss 1.3345 LearningRate 0.0032 Epoch: 16 Global Step: 204260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:27:01,820-Speed 3278.46 samples/sec Loss 1.3422 LearningRate 0.0032 Epoch: 16 Global Step: 204270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:04,968-Speed 3253.54 samples/sec Loss 1.3739 LearningRate 0.0032 Epoch: 16 Global Step: 204280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:08,121-Speed 3248.49 samples/sec Loss 1.3991 LearningRate 0.0032 Epoch: 16 Global Step: 204290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:11,189-Speed 3339.33 samples/sec Loss 1.3621 LearningRate 0.0032 Epoch: 16 Global Step: 204300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:14,302-Speed 3290.38 samples/sec Loss 1.3677 LearningRate 0.0032 Epoch: 16 Global Step: 204310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:17,396-Speed 3309.69 samples/sec Loss 1.3770 LearningRate 0.0032 Epoch: 16 Global Step: 204320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:20,469-Speed 3334.30 samples/sec Loss 1.4062 LearningRate 0.0031 Epoch: 16 Global Step: 204330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:23,669-Speed 3200.59 samples/sec Loss 1.3988 LearningRate 0.0031 Epoch: 16 Global Step: 204340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:26,782-Speed 3290.61 samples/sec Loss 1.3937 LearningRate 0.0031 Epoch: 16 Global Step: 204350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:29,870-Speed 3316.94 samples/sec Loss 1.3537 LearningRate 0.0031 Epoch: 16 Global Step: 204360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:32,936-Speed 3340.48 samples/sec Loss 1.3664 LearningRate 0.0031 Epoch: 16 Global Step: 204370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:27:36,021-Speed 3320.31 samples/sec Loss 1.3581 LearningRate 0.0031 Epoch: 16 Global Step: 204380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:27:39,090-Speed 3337.60 samples/sec Loss 1.3795 LearningRate 0.0031 Epoch: 16 Global Step: 204390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:27:42,137-Speed 3361.15 samples/sec Loss 1.4101 LearningRate 0.0031 Epoch: 16 Global Step: 204400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:45,217-Speed 3326.34 samples/sec Loss 1.3196 LearningRate 0.0031 Epoch: 16 Global Step: 204410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:48,328-Speed 3292.38 samples/sec Loss 1.4079 LearningRate 0.0031 Epoch: 16 Global Step: 204420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:51,516-Speed 3212.67 samples/sec Loss 1.4010 LearningRate 0.0031 Epoch: 16 Global Step: 204430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:54,630-Speed 3289.09 samples/sec Loss 1.4278 LearningRate 0.0031 Epoch: 16 Global Step: 204440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:27:57,747-Speed 3286.56 samples/sec Loss 1.3800 LearningRate 0.0031 Epoch: 16 Global Step: 204450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:28:00,836-Speed 3316.85 samples/sec Loss 1.3613 LearningRate 0.0031 Epoch: 16 Global Step: 204460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:28:03,928-Speed 3312.15 samples/sec Loss 1.3753 LearningRate 0.0031 Epoch: 16 Global Step: 204470 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:07,067-Speed 3263.60 samples/sec Loss 1.3458 LearningRate 0.0031 Epoch: 16 Global Step: 204480 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:10,132-Speed 3341.96 samples/sec Loss 1.3816 LearningRate 0.0031 Epoch: 16 Global Step: 204490 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:13,318-Speed 3214.93 samples/sec Loss 1.3768 LearningRate 0.0031 Epoch: 16 Global Step: 204500 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:16,403-Speed 3319.95 samples/sec Loss 1.3639 LearningRate 0.0031 Epoch: 16 Global Step: 204510 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:19,554-Speed 3250.22 samples/sec Loss 1.4085 LearningRate 0.0031 Epoch: 16 Global Step: 204520 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:22,627-Speed 3333.91 samples/sec Loss 1.3823 LearningRate 0.0031 Epoch: 16 Global Step: 204530 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:25,720-Speed 3311.51 samples/sec Loss 1.3514 LearningRate 0.0031 Epoch: 16 Global Step: 204540 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:28,832-Speed 3292.41 samples/sec Loss 1.3878 LearningRate 0.0031 Epoch: 16 Global Step: 204550 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:31,946-Speed 3288.94 samples/sec Loss 1.4053 LearningRate 0.0031 Epoch: 16 Global Step: 204560 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:35,055-Speed 3295.10 samples/sec Loss 1.3846 LearningRate 0.0031 Epoch: 16 Global Step: 204570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:28:38,207-Speed 3249.76 samples/sec Loss 1.3638 LearningRate 0.0031 Epoch: 16 Global Step: 204580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:28:41,356-Speed 3252.78 samples/sec Loss 1.3754 LearningRate 0.0031 Epoch: 16 Global Step: 204590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:28:44,422-Speed 3340.65 samples/sec Loss 1.3575 LearningRate 0.0031 Epoch: 16 Global Step: 204600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:28:47,516-Speed 3311.04 samples/sec Loss 1.3538 LearningRate 0.0031 Epoch: 16 Global Step: 204610 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:50,597-Speed 3324.42 samples/sec Loss 1.3344 LearningRate 0.0031 Epoch: 16 Global Step: 204620 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:53,702-Speed 3299.57 samples/sec Loss 1.3443 LearningRate 0.0031 Epoch: 16 Global Step: 204630 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:56,831-Speed 3273.40 samples/sec Loss 1.3800 LearningRate 0.0031 Epoch: 16 Global Step: 204640 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:28:59,911-Speed 3325.76 samples/sec Loss 1.3311 LearningRate 0.0031 Epoch: 16 Global Step: 204650 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:03,010-Speed 3304.99 samples/sec Loss 1.3657 LearningRate 0.0031 Epoch: 16 Global Step: 204660 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:06,177-Speed 3234.44 samples/sec Loss 1.3694 LearningRate 0.0031 Epoch: 16 Global Step: 204670 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:09,270-Speed 3312.09 samples/sec Loss 1.4155 LearningRate 0.0031 Epoch: 16 Global Step: 204680 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:12,389-Speed 3284.09 samples/sec Loss 1.4084 LearningRate 0.0031 Epoch: 16 Global Step: 204690 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:15,471-Speed 3323.33 samples/sec Loss 1.3862 LearningRate 0.0031 Epoch: 16 Global Step: 204700 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:18,613-Speed 3260.57 samples/sec Loss 1.3589 LearningRate 0.0031 Epoch: 16 Global Step: 204710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:29:21,681-Speed 3338.74 samples/sec Loss 1.3771 LearningRate 0.0031 Epoch: 16 Global Step: 204720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:29:24,788-Speed 3296.91 samples/sec Loss 1.3639 LearningRate 0.0031 Epoch: 16 Global Step: 204730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:29:27,855-Speed 3339.46 samples/sec Loss 1.3554 LearningRate 0.0031 Epoch: 16 Global Step: 204740 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:31,058-Speed 3197.66 samples/sec Loss 1.3927 LearningRate 0.0031 Epoch: 16 Global Step: 204750 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:34,173-Speed 3288.29 samples/sec Loss 1.3889 LearningRate 0.0031 Epoch: 16 Global Step: 204760 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:37,330-Speed 3244.67 samples/sec Loss 1.3874 LearningRate 0.0031 Epoch: 16 Global Step: 204770 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:40,470-Speed 3261.56 samples/sec Loss 1.3764 LearningRate 0.0031 Epoch: 16 Global Step: 204780 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:43,595-Speed 3277.98 samples/sec Loss 1.4246 LearningRate 0.0031 Epoch: 16 Global Step: 204790 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:46,754-Speed 3242.59 samples/sec Loss 1.3535 LearningRate 0.0031 Epoch: 16 Global Step: 204800 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:49,863-Speed 3294.85 samples/sec Loss 1.3805 LearningRate 0.0031 Epoch: 16 Global Step: 204810 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:53,045-Speed 3219.38 samples/sec Loss 1.4080 LearningRate 0.0031 Epoch: 16 Global Step: 204820 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:56,141-Speed 3308.46 samples/sec Loss 1.3994 LearningRate 0.0031 Epoch: 16 Global Step: 204830 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:29:59,245-Speed 3299.48 samples/sec Loss 1.3415 LearningRate 0.0031 Epoch: 16 Global Step: 204840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:02,353-Speed 3296.03 samples/sec Loss 1.4251 LearningRate 0.0031 Epoch: 16 Global Step: 204850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:05,472-Speed 3284.77 samples/sec Loss 1.3690 LearningRate 0.0031 Epoch: 16 Global Step: 204860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:08,598-Speed 3276.79 samples/sec Loss 1.3863 LearningRate 0.0031 Epoch: 16 Global Step: 204870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:11,693-Speed 3308.51 samples/sec Loss 1.3344 LearningRate 0.0031 Epoch: 16 Global Step: 204880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:14,786-Speed 3312.50 samples/sec Loss 1.3871 LearningRate 0.0031 Epoch: 16 Global Step: 204890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:17,923-Speed 3265.93 samples/sec Loss 1.3533 LearningRate 0.0031 Epoch: 16 Global Step: 204900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:21,005-Speed 3322.44 samples/sec Loss 1.3126 LearningRate 0.0031 Epoch: 16 Global Step: 204910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:24,108-Speed 3301.33 samples/sec Loss 1.3833 LearningRate 0.0031 Epoch: 16 Global Step: 204920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:27,260-Speed 3249.98 samples/sec Loss 1.4358 LearningRate 0.0031 Epoch: 16 Global Step: 204930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:30,398-Speed 3264.39 samples/sec Loss 1.4338 LearningRate 0.0031 Epoch: 16 Global Step: 204940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:30:33,486-Speed 3316.95 samples/sec Loss 1.3405 LearningRate 0.0031 Epoch: 16 Global Step: 204950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:36,562-Speed 3330.27 samples/sec Loss 1.3894 LearningRate 0.0031 Epoch: 16 Global Step: 204960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:39,675-Speed 3289.96 samples/sec Loss 1.3374 LearningRate 0.0031 Epoch: 16 Global Step: 204970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:42,795-Speed 3282.99 samples/sec Loss 1.3849 LearningRate 0.0031 Epoch: 16 Global Step: 204980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:45,901-Speed 3298.22 samples/sec Loss 1.4211 LearningRate 0.0031 Epoch: 16 Global Step: 204990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:49,016-Speed 3288.18 samples/sec Loss 1.4149 LearningRate 0.0031 Epoch: 16 Global Step: 205000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:52,187-Speed 3230.28 samples/sec Loss 1.3348 LearningRate 0.0031 Epoch: 16 Global Step: 205010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:55,291-Speed 3300.37 samples/sec Loss 1.3921 LearningRate 0.0031 Epoch: 16 Global Step: 205020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:30:58,460-Speed 3232.17 samples/sec Loss 1.3825 LearningRate 0.0031 Epoch: 16 Global Step: 205030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:01,572-Speed 3291.42 samples/sec Loss 1.4395 LearningRate 0.0030 Epoch: 16 Global Step: 205040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:04,628-Speed 3351.10 samples/sec Loss 1.3981 LearningRate 0.0030 Epoch: 16 Global Step: 205050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:07,701-Speed 3333.64 samples/sec Loss 1.3134 LearningRate 0.0030 Epoch: 16 Global Step: 205060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:10,787-Speed 3319.08 samples/sec Loss 1.4401 LearningRate 0.0030 Epoch: 16 Global Step: 205070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:13,896-Speed 3295.32 samples/sec Loss 1.3616 LearningRate 0.0030 Epoch: 16 Global Step: 205080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:17,054-Speed 3243.10 samples/sec Loss 1.3926 LearningRate 0.0030 Epoch: 16 Global Step: 205090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:20,159-Speed 3298.27 samples/sec Loss 1.3793 LearningRate 0.0030 Epoch: 16 Global Step: 205100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:23,360-Speed 3200.12 samples/sec Loss 1.3886 LearningRate 0.0030 Epoch: 16 Global Step: 205110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:26,478-Speed 3285.98 samples/sec Loss 1.3344 LearningRate 0.0030 Epoch: 16 Global Step: 205120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:29,660-Speed 3218.43 samples/sec Loss 1.3570 LearningRate 0.0030 Epoch: 16 Global Step: 205130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:32,793-Speed 3270.39 samples/sec Loss 1.4096 LearningRate 0.0030 Epoch: 16 Global Step: 205140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:31:35,866-Speed 3332.85 samples/sec Loss 1.4071 LearningRate 0.0030 Epoch: 16 Global Step: 205150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:31:38,964-Speed 3306.25 samples/sec Loss 1.3959 LearningRate 0.0030 Epoch: 16 Global Step: 205160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:31:42,100-Speed 3266.34 samples/sec Loss 1.3876 LearningRate 0.0030 Epoch: 16 Global Step: 205170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:31:45,214-Speed 3289.68 samples/sec Loss 1.3243 LearningRate 0.0030 Epoch: 16 Global Step: 205180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:31:48,379-Speed 3236.24 samples/sec Loss 1.3766 LearningRate 0.0030 Epoch: 16 Global Step: 205190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:31:51,515-Speed 3265.52 samples/sec Loss 1.3469 LearningRate 0.0030 Epoch: 16 Global Step: 205200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:31:54,598-Speed 3323.16 samples/sec Loss 1.3137 LearningRate 0.0030 Epoch: 16 Global Step: 205210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:31:57,691-Speed 3312.10 samples/sec Loss 1.3639 LearningRate 0.0030 Epoch: 16 Global Step: 205220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:32:00,784-Speed 3311.35 samples/sec Loss 1.3721 LearningRate 0.0030 Epoch: 16 Global Step: 205230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:32:03,921-Speed 3265.31 samples/sec Loss 1.3536 LearningRate 0.0030 Epoch: 16 Global Step: 205240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:32:07,066-Speed 3257.26 samples/sec Loss 1.3917 LearningRate 0.0030 Epoch: 16 Global Step: 205250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:32:10,159-Speed 3311.20 samples/sec Loss 1.3454 LearningRate 0.0030 Epoch: 16 Global Step: 205260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:32:13,285-Speed 3277.16 samples/sec Loss 1.3789 LearningRate 0.0030 Epoch: 16 Global Step: 205270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:32:16,348-Speed 3344.64 samples/sec Loss 1.4084 LearningRate 0.0030 Epoch: 16 Global Step: 205280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:32:19,493-Speed 3256.20 samples/sec Loss 1.3583 LearningRate 0.0030 Epoch: 16 Global Step: 205290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:32:22,570-Speed 3329.54 samples/sec Loss 1.4281 LearningRate 0.0030 Epoch: 16 Global Step: 205300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:32:25,726-Speed 3244.97 samples/sec Loss 1.4000 LearningRate 0.0030 Epoch: 16 Global Step: 205310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:32:28,885-Speed 3242.65 samples/sec Loss 1.3780 LearningRate 0.0030 Epoch: 16 Global Step: 205320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:32:31,961-Speed 3330.26 samples/sec Loss 1.3427 LearningRate 0.0030 Epoch: 16 Global Step: 205330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:32:35,025-Speed 3343.42 samples/sec Loss 1.3258 LearningRate 0.0030 Epoch: 16 Global Step: 205340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:32:38,158-Speed 3269.36 samples/sec Loss 1.4051 LearningRate 0.0030 Epoch: 16 Global Step: 205350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:32:41,235-Speed 3328.12 samples/sec Loss 1.3456 LearningRate 0.0030 Epoch: 16 Global Step: 205360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:32:44,354-Speed 3284.27 samples/sec Loss 1.3893 LearningRate 0.0030 Epoch: 16 Global Step: 205370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:32:47,421-Speed 3340.20 samples/sec Loss 1.3618 LearningRate 0.0030 Epoch: 16 Global Step: 205380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:32:50,563-Speed 3260.30 samples/sec Loss 1.3525 LearningRate 0.0030 Epoch: 16 Global Step: 205390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:32:53,744-Speed 3219.75 samples/sec Loss 1.3666 LearningRate 0.0030 Epoch: 16 Global Step: 205400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:32:56,820-Speed 3329.92 samples/sec Loss 1.3703 LearningRate 0.0030 Epoch: 16 Global Step: 205410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:32:59,935-Speed 3288.66 samples/sec Loss 1.3767 LearningRate 0.0030 Epoch: 16 Global Step: 205420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:33:03,105-Speed 3231.19 samples/sec Loss 1.4040 LearningRate 0.0030 Epoch: 16 Global Step: 205430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:33:06,282-Speed 3223.91 samples/sec Loss 1.3590 LearningRate 0.0030 Epoch: 16 Global Step: 205440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:33:09,371-Speed 3316.35 samples/sec Loss 1.3717 LearningRate 0.0030 Epoch: 16 Global Step: 205450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:33:12,490-Speed 3283.51 samples/sec Loss 1.3895 LearningRate 0.0030 Epoch: 16 Global Step: 205460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:33:15,608-Speed 3286.12 samples/sec Loss 1.4132 LearningRate 0.0030 Epoch: 16 Global Step: 205470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:33:18,815-Speed 3193.49 samples/sec Loss 1.3926 LearningRate 0.0030 Epoch: 16 Global Step: 205480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:33:21,922-Speed 3297.07 samples/sec Loss 1.3426 LearningRate 0.0030 Epoch: 16 Global Step: 205490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:33:24,995-Speed 3332.48 samples/sec Loss 1.3842 LearningRate 0.0030 Epoch: 16 Global Step: 205500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:33:28,189-Speed 3207.05 samples/sec Loss 1.3880 LearningRate 0.0030 Epoch: 16 Global Step: 205510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:33:31,281-Speed 3312.52 samples/sec Loss 1.3554 LearningRate 0.0030 Epoch: 16 Global Step: 205520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:33:34,365-Speed 3322.33 samples/sec Loss 1.3301 LearningRate 0.0030 Epoch: 16 Global Step: 205530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:33:37,519-Speed 3246.97 samples/sec Loss 1.3918 LearningRate 0.0030 Epoch: 16 Global Step: 205540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:33:40,633-Speed 3289.26 samples/sec Loss 1.3530 LearningRate 0.0030 Epoch: 16 Global Step: 205550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:33:43,815-Speed 3219.26 samples/sec Loss 1.3434 LearningRate 0.0030 Epoch: 16 Global Step: 205560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:33:46,926-Speed 3293.33 samples/sec Loss 1.3516 LearningRate 0.0030 Epoch: 16 Global Step: 205570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:33:50,070-Speed 3257.52 samples/sec Loss 1.3668 LearningRate 0.0030 Epoch: 16 Global Step: 205580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:33:53,274-Speed 3197.27 samples/sec Loss 1.3514 LearningRate 0.0030 Epoch: 16 Global Step: 205590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:33:56,347-Speed 3332.91 samples/sec Loss 1.3769 LearningRate 0.0030 Epoch: 16 Global Step: 205600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:33:59,469-Speed 3281.03 samples/sec Loss 1.3883 LearningRate 0.0030 Epoch: 16 Global Step: 205610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:02,626-Speed 3244.86 samples/sec Loss 1.3637 LearningRate 0.0030 Epoch: 16 Global Step: 205620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:05,724-Speed 3306.37 samples/sec Loss 1.3726 LearningRate 0.0030 Epoch: 16 Global Step: 205630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:08,816-Speed 3313.35 samples/sec Loss 1.4309 LearningRate 0.0030 Epoch: 16 Global Step: 205640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:11,978-Speed 3239.81 samples/sec Loss 1.3691 LearningRate 0.0030 Epoch: 16 Global Step: 205650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:15,071-Speed 3310.88 samples/sec Loss 1.3783 LearningRate 0.0030 Epoch: 16 Global Step: 205660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:18,219-Speed 3254.31 samples/sec Loss 1.3339 LearningRate 0.0030 Epoch: 16 Global Step: 205670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:21,354-Speed 3267.61 samples/sec Loss 1.3619 LearningRate 0.0030 Epoch: 16 Global Step: 205680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:24,497-Speed 3258.76 samples/sec Loss 1.3701 LearningRate 0.0030 Epoch: 16 Global Step: 205690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:27,635-Speed 3263.51 samples/sec Loss 1.3515 LearningRate 0.0030 Epoch: 16 Global Step: 205700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:30,739-Speed 3300.52 samples/sec Loss 1.4006 LearningRate 0.0030 Epoch: 16 Global Step: 205710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:34:33,857-Speed 3285.23 samples/sec Loss 1.3954 LearningRate 0.0030 Epoch: 16 Global Step: 205720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:34:36,946-Speed 3316.51 samples/sec Loss 1.3568 LearningRate 0.0030 Epoch: 16 Global Step: 205730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:40,041-Speed 3308.61 samples/sec Loss 1.4052 LearningRate 0.0030 Epoch: 16 Global Step: 205740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:43,163-Speed 3281.80 samples/sec Loss 1.3524 LearningRate 0.0030 Epoch: 16 Global Step: 205750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:46,257-Speed 3310.11 samples/sec Loss 1.3716 LearningRate 0.0029 Epoch: 16 Global Step: 205760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:49,376-Speed 3284.17 samples/sec Loss 1.3969 LearningRate 0.0029 Epoch: 16 Global Step: 205770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:52,491-Speed 3287.92 samples/sec Loss 1.4042 LearningRate 0.0029 Epoch: 16 Global Step: 205780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:55,604-Speed 3290.77 samples/sec Loss 1.3606 LearningRate 0.0029 Epoch: 16 Global Step: 205790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:34:58,692-Speed 3316.97 samples/sec Loss 1.3362 LearningRate 0.0029 Epoch: 16 Global Step: 205800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:01,819-Speed 3276.43 samples/sec Loss 1.3540 LearningRate 0.0029 Epoch: 16 Global Step: 205810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:04,934-Speed 3287.83 samples/sec Loss 1.3490 LearningRate 0.0029 Epoch: 16 Global Step: 205820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:08,091-Speed 3244.74 samples/sec Loss 1.3573 LearningRate 0.0029 Epoch: 16 Global Step: 205830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:35:11,180-Speed 3315.63 samples/sec Loss 1.3720 LearningRate 0.0029 Epoch: 16 Global Step: 205840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:35:14,253-Speed 3333.36 samples/sec Loss 1.4333 LearningRate 0.0029 Epoch: 16 Global Step: 205850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:17,394-Speed 3261.30 samples/sec Loss 1.3712 LearningRate 0.0029 Epoch: 16 Global Step: 205860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:20,482-Speed 3317.58 samples/sec Loss 1.4188 LearningRate 0.0029 Epoch: 16 Global Step: 205870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:23,606-Speed 3278.81 samples/sec Loss 1.3598 LearningRate 0.0029 Epoch: 16 Global Step: 205880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:26,757-Speed 3251.00 samples/sec Loss 1.3793 LearningRate 0.0029 Epoch: 16 Global Step: 205890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:29,924-Speed 3233.63 samples/sec Loss 1.4166 LearningRate 0.0029 Epoch: 16 Global Step: 205900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:33,049-Speed 3278.33 samples/sec Loss 1.3423 LearningRate 0.0029 Epoch: 16 Global Step: 205910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:36,217-Speed 3233.46 samples/sec Loss 1.3467 LearningRate 0.0029 Epoch: 16 Global Step: 205920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:39,319-Speed 3301.46 samples/sec Loss 1.3710 LearningRate 0.0029 Epoch: 16 Global Step: 205930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:42,397-Speed 3327.86 samples/sec Loss 1.4208 LearningRate 0.0029 Epoch: 16 Global Step: 205940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:45,471-Speed 3332.37 samples/sec Loss 1.3856 LearningRate 0.0029 Epoch: 16 Global Step: 205950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:35:48,558-Speed 3318.39 samples/sec Loss 1.3811 LearningRate 0.0029 Epoch: 16 Global Step: 205960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:51,640-Speed 3323.02 samples/sec Loss 1.3958 LearningRate 0.0029 Epoch: 16 Global Step: 205970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:54,722-Speed 3323.53 samples/sec Loss 1.3282 LearningRate 0.0029 Epoch: 16 Global Step: 205980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:35:57,857-Speed 3267.93 samples/sec Loss 1.4024 LearningRate 0.0029 Epoch: 16 Global Step: 205990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:00,982-Speed 3278.19 samples/sec Loss 1.3363 LearningRate 0.0029 Epoch: 16 Global Step: 206000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:04,141-Speed 3241.84 samples/sec Loss 1.3837 LearningRate 0.0029 Epoch: 16 Global Step: 206010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:07,277-Speed 3266.64 samples/sec Loss 1.3882 LearningRate 0.0029 Epoch: 16 Global Step: 206020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:10,369-Speed 3313.12 samples/sec Loss 1.3611 LearningRate 0.0029 Epoch: 16 Global Step: 206030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:13,481-Speed 3291.86 samples/sec Loss 1.3627 LearningRate 0.0029 Epoch: 16 Global Step: 206040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:16,634-Speed 3248.44 samples/sec Loss 1.3578 LearningRate 0.0029 Epoch: 16 Global Step: 206050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:19,746-Speed 3291.23 samples/sec Loss 1.4148 LearningRate 0.0029 Epoch: 16 Global Step: 206060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:36:22,893-Speed 3254.69 samples/sec Loss 1.3399 LearningRate 0.0029 Epoch: 16 Global Step: 206070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:36:25,987-Speed 3311.03 samples/sec Loss 1.3795 LearningRate 0.0029 Epoch: 16 Global Step: 206080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:29,093-Speed 3298.02 samples/sec Loss 1.3761 LearningRate 0.0029 Epoch: 16 Global Step: 206090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:32,198-Speed 3299.45 samples/sec Loss 1.3990 LearningRate 0.0029 Epoch: 16 Global Step: 206100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:35,292-Speed 3309.97 samples/sec Loss 1.3654 LearningRate 0.0029 Epoch: 16 Global Step: 206110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:38,443-Speed 3250.95 samples/sec Loss 1.3605 LearningRate 0.0029 Epoch: 16 Global Step: 206120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:41,592-Speed 3252.95 samples/sec Loss 1.3583 LearningRate 0.0029 Epoch: 16 Global Step: 206130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:44,682-Speed 3314.80 samples/sec Loss 1.4042 LearningRate 0.0029 Epoch: 16 Global Step: 206140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:47,769-Speed 3318.03 samples/sec Loss 1.3613 LearningRate 0.0029 Epoch: 16 Global Step: 206150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:50,834-Speed 3342.27 samples/sec Loss 1.3211 LearningRate 0.0029 Epoch: 16 Global Step: 206160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:54,054-Speed 3181.00 samples/sec Loss 1.3518 LearningRate 0.0029 Epoch: 16 Global Step: 206170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:36:57,142-Speed 3317.50 samples/sec Loss 1.4579 LearningRate 0.0029 Epoch: 16 Global Step: 206180 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:37:00,265-Speed 3280.13 samples/sec Loss 1.3364 LearningRate 0.0029 Epoch: 16 Global Step: 206190 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:37:03,359-Speed 3310.06 samples/sec Loss 1.3508 LearningRate 0.0029 Epoch: 16 Global Step: 206200 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:37:06,551-Speed 3209.71 samples/sec Loss 1.3839 LearningRate 0.0029 Epoch: 16 Global Step: 206210 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:37:09,637-Speed 3318.63 samples/sec Loss 1.3495 LearningRate 0.0029 Epoch: 16 Global Step: 206220 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:37:12,740-Speed 3301.72 samples/sec Loss 1.3299 LearningRate 0.0029 Epoch: 16 Global Step: 206230 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:37:15,872-Speed 3269.92 samples/sec Loss 1.3952 LearningRate 0.0029 Epoch: 16 Global Step: 206240 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:37:19,011-Speed 3263.92 samples/sec Loss 1.4030 LearningRate 0.0029 Epoch: 16 Global Step: 206250 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:37:22,082-Speed 3334.91 samples/sec Loss 1.3498 LearningRate 0.0029 Epoch: 16 Global Step: 206260 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:37:25,207-Speed 3278.61 samples/sec Loss 1.3374 LearningRate 0.0029 Epoch: 16 Global Step: 206270 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:37:28,310-Speed 3300.48 samples/sec Loss 1.3862 LearningRate 0.0029 Epoch: 16 Global Step: 206280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:37:31,388-Speed 3327.40 samples/sec Loss 1.3887 LearningRate 0.0029 Epoch: 16 Global Step: 206290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:37:34,479-Speed 3314.11 samples/sec Loss 1.3786 LearningRate 0.0029 Epoch: 16 Global Step: 206300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:37:37,620-Speed 3260.70 samples/sec Loss 1.3567 LearningRate 0.0029 Epoch: 16 Global Step: 206310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:37:40,738-Speed 3285.04 samples/sec Loss 1.3770 LearningRate 0.0029 Epoch: 16 Global Step: 206320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:37:43,852-Speed 3289.60 samples/sec Loss 1.4107 LearningRate 0.0029 Epoch: 16 Global Step: 206330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:37:46,937-Speed 3320.58 samples/sec Loss 1.3296 LearningRate 0.0029 Epoch: 16 Global Step: 206340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:37:50,087-Speed 3251.99 samples/sec Loss 1.3932 LearningRate 0.0029 Epoch: 16 Global Step: 206350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:37:53,238-Speed 3250.83 samples/sec Loss 1.3854 LearningRate 0.0029 Epoch: 16 Global Step: 206360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:37:56,390-Speed 3249.74 samples/sec Loss 1.3759 LearningRate 0.0029 Epoch: 16 Global Step: 206370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:37:59,523-Speed 3268.58 samples/sec Loss 1.4160 LearningRate 0.0029 Epoch: 16 Global Step: 206380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:38:02,629-Speed 3298.29 samples/sec Loss 1.3699 LearningRate 0.0029 Epoch: 16 Global Step: 206390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:38:05,768-Speed 3262.49 samples/sec Loss 1.4352 LearningRate 0.0029 Epoch: 16 Global Step: 206400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:38:08,857-Speed 3316.94 samples/sec Loss 1.3891 LearningRate 0.0029 Epoch: 16 Global Step: 206410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:11,962-Speed 3297.91 samples/sec Loss 1.3576 LearningRate 0.0029 Epoch: 16 Global Step: 206420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:15,053-Speed 3314.28 samples/sec Loss 1.3448 LearningRate 0.0029 Epoch: 16 Global Step: 206430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:18,171-Speed 3285.00 samples/sec Loss 1.3546 LearningRate 0.0029 Epoch: 16 Global Step: 206440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:21,251-Speed 3325.78 samples/sec Loss 1.3887 LearningRate 0.0029 Epoch: 16 Global Step: 206450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:24,379-Speed 3275.04 samples/sec Loss 1.4042 LearningRate 0.0029 Epoch: 16 Global Step: 206460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:27,484-Speed 3299.25 samples/sec Loss 1.3875 LearningRate 0.0029 Epoch: 16 Global Step: 206470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:30,577-Speed 3311.65 samples/sec Loss 1.4069 LearningRate 0.0029 Epoch: 16 Global Step: 206480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:33,670-Speed 3310.51 samples/sec Loss 1.3366 LearningRate 0.0028 Epoch: 16 Global Step: 206490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:36,822-Speed 3250.20 samples/sec Loss 1.3424 LearningRate 0.0028 Epoch: 16 Global Step: 206500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:39,973-Speed 3251.12 samples/sec Loss 1.3578 LearningRate 0.0028 Epoch: 16 Global Step: 206510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:38:43,066-Speed 3311.99 samples/sec Loss 1.3318 LearningRate 0.0028 Epoch: 16 Global Step: 206520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:46,202-Speed 3266.30 samples/sec Loss 1.3742 LearningRate 0.0028 Epoch: 16 Global Step: 206530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:49,320-Speed 3284.59 samples/sec Loss 1.3462 LearningRate 0.0028 Epoch: 16 Global Step: 206540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:52,433-Speed 3291.10 samples/sec Loss 1.3785 LearningRate 0.0028 Epoch: 16 Global Step: 206550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:55,535-Speed 3301.67 samples/sec Loss 1.4118 LearningRate 0.0028 Epoch: 16 Global Step: 206560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:38:58,669-Speed 3268.89 samples/sec Loss 1.3726 LearningRate 0.0028 Epoch: 16 Global Step: 206570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:39:01,797-Speed 3274.61 samples/sec Loss 1.4344 LearningRate 0.0028 Epoch: 16 Global Step: 206580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:39:04,912-Speed 3287.56 samples/sec Loss 1.3991 LearningRate 0.0028 Epoch: 16 Global Step: 206590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:39:08,007-Speed 3310.11 samples/sec Loss 1.4042 LearningRate 0.0028 Epoch: 16 Global Step: 206600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:39:11,105-Speed 3306.55 samples/sec Loss 1.3693 LearningRate 0.0028 Epoch: 16 Global Step: 206610 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:39:14,243-Speed 3264.63 samples/sec Loss 1.3134 LearningRate 0.0028 Epoch: 16 Global Step: 206620 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:39:17,338-Speed 3309.28 samples/sec Loss 1.4024 LearningRate 0.0028 Epoch: 16 Global Step: 206630 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:39:20,430-Speed 3312.58 samples/sec Loss 1.4044 LearningRate 0.0028 Epoch: 16 Global Step: 206640 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:39:23,561-Speed 3271.52 samples/sec Loss 1.3745 LearningRate 0.0028 Epoch: 16 Global Step: 206650 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:39:26,670-Speed 3295.82 samples/sec Loss 1.3753 LearningRate 0.0028 Epoch: 16 Global Step: 206660 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:39:29,806-Speed 3266.03 samples/sec Loss 1.4145 LearningRate 0.0028 Epoch: 16 Global Step: 206670 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:39:32,945-Speed 3262.48 samples/sec Loss 1.3852 LearningRate 0.0028 Epoch: 16 Global Step: 206680 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:39:36,041-Speed 3308.15 samples/sec Loss 1.3661 LearningRate 0.0028 Epoch: 16 Global Step: 206690 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:39:39,157-Speed 3287.68 samples/sec Loss 1.4364 LearningRate 0.0028 Epoch: 16 Global Step: 206700 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:39:42,317-Speed 3241.31 samples/sec Loss 1.3753 LearningRate 0.0028 Epoch: 16 Global Step: 206710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:39:45,402-Speed 3321.25 samples/sec Loss 1.3465 LearningRate 0.0028 Epoch: 16 Global Step: 206720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:39:48,480-Speed 3327.66 samples/sec Loss 1.4117 LearningRate 0.0028 Epoch: 16 Global Step: 206730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:39:51,589-Speed 3294.59 samples/sec Loss 1.3808 LearningRate 0.0028 Epoch: 16 Global Step: 206740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:39:54,798-Speed 3205.74 samples/sec Loss 1.3842 LearningRate 0.0028 Epoch: 16 Global Step: 206750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:39:57,866-Speed 3338.72 samples/sec Loss 1.4021 LearningRate 0.0028 Epoch: 16 Global Step: 206760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:01,030-Speed 3237.63 samples/sec Loss 1.4062 LearningRate 0.0028 Epoch: 16 Global Step: 206770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:04,193-Speed 3237.93 samples/sec Loss 1.3672 LearningRate 0.0028 Epoch: 16 Global Step: 206780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:07,348-Speed 3247.54 samples/sec Loss 1.3484 LearningRate 0.0028 Epoch: 16 Global Step: 206790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:10,411-Speed 3343.67 samples/sec Loss 1.3781 LearningRate 0.0028 Epoch: 16 Global Step: 206800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:13,517-Speed 3298.17 samples/sec Loss 1.3953 LearningRate 0.0028 Epoch: 16 Global Step: 206810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:40:16,645-Speed 3274.42 samples/sec Loss 1.3974 LearningRate 0.0028 Epoch: 16 Global Step: 206820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:40:19,719-Speed 3332.52 samples/sec Loss 1.3811 LearningRate 0.0028 Epoch: 16 Global Step: 206830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:22,807-Speed 3316.94 samples/sec Loss 1.4026 LearningRate 0.0028 Epoch: 16 Global Step: 206840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:25,964-Speed 3244.50 samples/sec Loss 1.3736 LearningRate 0.0028 Epoch: 16 Global Step: 206850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:29,121-Speed 3244.47 samples/sec Loss 1.3281 LearningRate 0.0028 Epoch: 16 Global Step: 206860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:32,244-Speed 3279.79 samples/sec Loss 1.3831 LearningRate 0.0028 Epoch: 16 Global Step: 206870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:35,356-Speed 3291.62 samples/sec Loss 1.3979 LearningRate 0.0028 Epoch: 16 Global Step: 206880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:38,483-Speed 3275.84 samples/sec Loss 1.4293 LearningRate 0.0028 Epoch: 16 Global Step: 206890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:41,623-Speed 3262.01 samples/sec Loss 1.3344 LearningRate 0.0028 Epoch: 16 Global Step: 206900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:44,721-Speed 3306.67 samples/sec Loss 1.4220 LearningRate 0.0028 Epoch: 16 Global Step: 206910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:47,788-Speed 3339.88 samples/sec Loss 1.3727 LearningRate 0.0028 Epoch: 16 Global Step: 206920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:40:50,873-Speed 3320.12 samples/sec Loss 1.3458 LearningRate 0.0028 Epoch: 16 Global Step: 206930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:40:54,000-Speed 3275.90 samples/sec Loss 1.3644 LearningRate 0.0028 Epoch: 16 Global Step: 206940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:40:57,074-Speed 3332.30 samples/sec Loss 1.3973 LearningRate 0.0028 Epoch: 16 Global Step: 206950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:41:00,134-Speed 3347.70 samples/sec Loss 1.3602 LearningRate 0.0028 Epoch: 16 Global Step: 206960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:41:03,207-Speed 3332.84 samples/sec Loss 1.3674 LearningRate 0.0028 Epoch: 16 Global Step: 206970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:41:06,359-Speed 3249.90 samples/sec Loss 1.3404 LearningRate 0.0028 Epoch: 16 Global Step: 206980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:41:09,436-Speed 3329.06 samples/sec Loss 1.3695 LearningRate 0.0028 Epoch: 16 Global Step: 206990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:41:12,559-Speed 3280.55 samples/sec Loss 1.3643 LearningRate 0.0028 Epoch: 16 Global Step: 207000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:41:15,780-Speed 3179.63 samples/sec Loss 1.3902 LearningRate 0.0028 Epoch: 16 Global Step: 207010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:41:18,882-Speed 3302.40 samples/sec Loss 1.3527 LearningRate 0.0028 Epoch: 16 Global Step: 207020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:41:21,986-Speed 3298.95 samples/sec Loss 1.4275 LearningRate 0.0028 Epoch: 16 Global Step: 207030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 19:41:25,043-Speed 3352.13 samples/sec Loss 1.3722 LearningRate 0.0028 Epoch: 16 Global Step: 207040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:41:28,207-Speed 3236.68 samples/sec Loss 1.3621 LearningRate 0.0028 Epoch: 16 Global Step: 207050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:41:31,282-Speed 3331.72 samples/sec Loss 1.4244 LearningRate 0.0028 Epoch: 16 Global Step: 207060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:41:34,399-Speed 3285.81 samples/sec Loss 1.3626 LearningRate 0.0028 Epoch: 16 Global Step: 207070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:41:37,559-Speed 3242.09 samples/sec Loss 1.3345 LearningRate 0.0028 Epoch: 16 Global Step: 207080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:41:40,749-Speed 3210.99 samples/sec Loss 1.3761 LearningRate 0.0028 Epoch: 16 Global Step: 207090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:41:43,855-Speed 3298.10 samples/sec Loss 1.4183 LearningRate 0.0028 Epoch: 16 Global Step: 207100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:41:46,954-Speed 3304.59 samples/sec Loss 1.3485 LearningRate 0.0028 Epoch: 16 Global Step: 207110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:41:50,070-Speed 3287.56 samples/sec Loss 1.3866 LearningRate 0.0028 Epoch: 16 Global Step: 207120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:41:53,182-Speed 3291.66 samples/sec Loss 1.4350 LearningRate 0.0028 Epoch: 16 Global Step: 207130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:41:56,268-Speed 3319.56 samples/sec Loss 1.4441 LearningRate 0.0028 Epoch: 16 Global Step: 207140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:41:59,357-Speed 3315.22 samples/sec Loss 1.3956 LearningRate 0.0028 Epoch: 16 Global Step: 207150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:42:02,464-Speed 3297.53 samples/sec Loss 1.3312 LearningRate 0.0028 Epoch: 16 Global Step: 207160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:05,573-Speed 3294.29 samples/sec Loss 1.3240 LearningRate 0.0028 Epoch: 16 Global Step: 207170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:08,620-Speed 3361.98 samples/sec Loss 1.3701 LearningRate 0.0028 Epoch: 16 Global Step: 207180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:11,737-Speed 3286.77 samples/sec Loss 1.3609 LearningRate 0.0028 Epoch: 16 Global Step: 207190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:14,885-Speed 3254.14 samples/sec Loss 1.3666 LearningRate 0.0028 Epoch: 16 Global Step: 207200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:17,965-Speed 3325.41 samples/sec Loss 1.3447 LearningRate 0.0028 Epoch: 16 Global Step: 207210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:21,047-Speed 3323.17 samples/sec Loss 1.4001 LearningRate 0.0028 Epoch: 16 Global Step: 207220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:24,122-Speed 3331.95 samples/sec Loss 1.4115 LearningRate 0.0027 Epoch: 16 Global Step: 207230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:27,306-Speed 3216.96 samples/sec Loss 1.3707 LearningRate 0.0027 Epoch: 16 Global Step: 207240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:30,444-Speed 3263.66 samples/sec Loss 1.3558 LearningRate 0.0027 Epoch: 16 Global Step: 207250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:33,529-Speed 3320.30 samples/sec Loss 1.3627 LearningRate 0.0027 Epoch: 16 Global Step: 207260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:42:36,663-Speed 3268.90 samples/sec Loss 1.3759 LearningRate 0.0027 Epoch: 16 Global Step: 207270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:39,744-Speed 3323.87 samples/sec Loss 1.3359 LearningRate 0.0027 Epoch: 16 Global Step: 207280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:42,846-Speed 3302.19 samples/sec Loss 1.3704 LearningRate 0.0027 Epoch: 16 Global Step: 207290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:45,932-Speed 3319.39 samples/sec Loss 1.3779 LearningRate 0.0027 Epoch: 16 Global Step: 207300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:49,044-Speed 3291.67 samples/sec Loss 1.3703 LearningRate 0.0027 Epoch: 16 Global Step: 207310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:52,247-Speed 3198.56 samples/sec Loss 1.3613 LearningRate 0.0027 Epoch: 16 Global Step: 207320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:55,352-Speed 3299.02 samples/sec Loss 1.3856 LearningRate 0.0027 Epoch: 16 Global Step: 207330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:42:58,469-Speed 3286.20 samples/sec Loss 1.3675 LearningRate 0.0027 Epoch: 16 Global Step: 207340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:43:01,662-Speed 3210.78 samples/sec Loss 1.4030 LearningRate 0.0027 Epoch: 16 Global Step: 207350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:43:04,746-Speed 3321.17 samples/sec Loss 1.3482 LearningRate 0.0027 Epoch: 16 Global Step: 207360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:43:07,819-Speed 3333.96 samples/sec Loss 1.3853 LearningRate 0.0027 Epoch: 16 Global Step: 207370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:43:10,859-Speed 3368.32 samples/sec Loss 1.3624 LearningRate 0.0027 Epoch: 16 Global Step: 207380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:43:13,989-Speed 3273.80 samples/sec Loss 1.3605 LearningRate 0.0027 Epoch: 16 Global Step: 207390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:43:17,116-Speed 3275.18 samples/sec Loss 1.3408 LearningRate 0.0027 Epoch: 16 Global Step: 207400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:43:20,219-Speed 3300.93 samples/sec Loss 1.3723 LearningRate 0.0027 Epoch: 16 Global Step: 207410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:43:23,311-Speed 3312.68 samples/sec Loss 1.3979 LearningRate 0.0027 Epoch: 16 Global Step: 207420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:43:26,387-Speed 3330.44 samples/sec Loss 1.3169 LearningRate 0.0027 Epoch: 16 Global Step: 207430 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:43:29,484-Speed 3307.37 samples/sec Loss 1.3303 LearningRate 0.0027 Epoch: 16 Global Step: 207440 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:43:32,570-Speed 3319.60 samples/sec Loss 1.3775 LearningRate 0.0027 Epoch: 16 Global Step: 207450 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:43:35,698-Speed 3273.97 samples/sec Loss 1.4161 LearningRate 0.0027 Epoch: 16 Global Step: 207460 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:43:38,834-Speed 3266.51 samples/sec Loss 1.3960 LearningRate 0.0027 Epoch: 16 Global Step: 207470 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:43:41,914-Speed 3325.88 samples/sec Loss 1.3466 LearningRate 0.0027 Epoch: 16 Global Step: 207480 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:43:45,021-Speed 3297.13 samples/sec Loss 1.3453 LearningRate 0.0027 Epoch: 16 Global Step: 207490 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:43:48,127-Speed 3298.01 samples/sec Loss 1.4027 LearningRate 0.0027 Epoch: 16 Global Step: 207500 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:43:51,272-Speed 3256.16 samples/sec Loss 1.3743 LearningRate 0.0027 Epoch: 16 Global Step: 207510 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:43:54,346-Speed 3332.63 samples/sec Loss 1.3502 LearningRate 0.0027 Epoch: 16 Global Step: 207520 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:43:57,435-Speed 3316.42 samples/sec Loss 1.3630 LearningRate 0.0027 Epoch: 16 Global Step: 207530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:00,624-Speed 3211.99 samples/sec Loss 1.3745 LearningRate 0.0027 Epoch: 16 Global Step: 207540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:03,733-Speed 3294.57 samples/sec Loss 1.3494 LearningRate 0.0027 Epoch: 16 Global Step: 207550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:06,936-Speed 3197.93 samples/sec Loss 1.4171 LearningRate 0.0027 Epoch: 16 Global Step: 207560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:10,049-Speed 3289.75 samples/sec Loss 1.3464 LearningRate 0.0027 Epoch: 16 Global Step: 207570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:13,134-Speed 3320.99 samples/sec Loss 1.3731 LearningRate 0.0027 Epoch: 16 Global Step: 207580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:16,313-Speed 3222.48 samples/sec Loss 1.3767 LearningRate 0.0027 Epoch: 16 Global Step: 207590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:19,441-Speed 3274.94 samples/sec Loss 1.3836 LearningRate 0.0027 Epoch: 16 Global Step: 207600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:22,521-Speed 3325.52 samples/sec Loss 1.3258 LearningRate 0.0027 Epoch: 16 Global Step: 207610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:25,690-Speed 3232.41 samples/sec Loss 1.3574 LearningRate 0.0027 Epoch: 16 Global Step: 207620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:28,771-Speed 3324.64 samples/sec Loss 1.3990 LearningRate 0.0027 Epoch: 16 Global Step: 207630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:44:31,865-Speed 3310.01 samples/sec Loss 1.4055 LearningRate 0.0027 Epoch: 16 Global Step: 207640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:44:35,020-Speed 3246.86 samples/sec Loss 1.3811 LearningRate 0.0027 Epoch: 16 Global Step: 207650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:44:38,114-Speed 3311.34 samples/sec Loss 1.3934 LearningRate 0.0027 Epoch: 16 Global Step: 207660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:44:41,189-Speed 3330.45 samples/sec Loss 1.3586 LearningRate 0.0027 Epoch: 16 Global Step: 207670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:44:44,295-Speed 3298.64 samples/sec Loss 1.4191 LearningRate 0.0027 Epoch: 16 Global Step: 207680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:47,418-Speed 3279.90 samples/sec Loss 1.3557 LearningRate 0.0027 Epoch: 16 Global Step: 207690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:50,519-Speed 3303.35 samples/sec Loss 1.3799 LearningRate 0.0027 Epoch: 16 Global Step: 207700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:53,618-Speed 3305.06 samples/sec Loss 1.3806 LearningRate 0.0027 Epoch: 16 Global Step: 207710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:56,722-Speed 3300.41 samples/sec Loss 1.3366 LearningRate 0.0027 Epoch: 16 Global Step: 207720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:44:59,838-Speed 3286.65 samples/sec Loss 1.3501 LearningRate 0.0027 Epoch: 16 Global Step: 207730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:45:02,950-Speed 3291.99 samples/sec Loss 1.3995 LearningRate 0.0027 Epoch: 16 Global Step: 207740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:45:06,032-Speed 3323.73 samples/sec Loss 1.3097 LearningRate 0.0027 Epoch: 16 Global Step: 207750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:45:09,135-Speed 3301.13 samples/sec Loss 1.3563 LearningRate 0.0027 Epoch: 16 Global Step: 207760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:45:12,293-Speed 3243.76 samples/sec Loss 1.3799 LearningRate 0.0027 Epoch: 16 Global Step: 207770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:45:15,416-Speed 3279.28 samples/sec Loss 1.4311 LearningRate 0.0027 Epoch: 16 Global Step: 207780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:45:18,559-Speed 3260.28 samples/sec Loss 1.3951 LearningRate 0.0027 Epoch: 16 Global Step: 207790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:45:21,649-Speed 3314.05 samples/sec Loss 1.3526 LearningRate 0.0027 Epoch: 16 Global Step: 207800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 19:45:24,798-Speed 3252.43 samples/sec Loss 1.3697 LearningRate 0.0027 Epoch: 16 Global Step: 207810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:45:27,976-Speed 3224.21 samples/sec Loss 1.4126 LearningRate 0.0027 Epoch: 16 Global Step: 207820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:45:31,153-Speed 3223.77 samples/sec Loss 1.4042 LearningRate 0.0027 Epoch: 16 Global Step: 207830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:45:34,279-Speed 3276.55 samples/sec Loss 1.3723 LearningRate 0.0027 Epoch: 16 Global Step: 207840 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:45:37,433-Speed 3248.36 samples/sec Loss 1.3636 LearningRate 0.0027 Epoch: 16 Global Step: 207850 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:45:40,536-Speed 3300.18 samples/sec Loss 1.3851 LearningRate 0.0027 Epoch: 16 Global Step: 207860 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:45:43,631-Speed 3310.62 samples/sec Loss 1.3701 LearningRate 0.0027 Epoch: 16 Global Step: 207870 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:45:46,755-Speed 3278.49 samples/sec Loss 1.3695 LearningRate 0.0027 Epoch: 16 Global Step: 207880 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:45:49,866-Speed 3293.19 samples/sec Loss 1.3712 LearningRate 0.0027 Epoch: 16 Global Step: 207890 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:45:53,039-Speed 3227.70 samples/sec Loss 1.3637 LearningRate 0.0027 Epoch: 16 Global Step: 207900 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:45:56,135-Speed 3308.30 samples/sec Loss 1.3756 LearningRate 0.0027 Epoch: 16 Global Step: 207910 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:45:59,263-Speed 3275.80 samples/sec Loss 1.3197 LearningRate 0.0027 Epoch: 16 Global Step: 207920 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:46:02,396-Speed 3268.66 samples/sec Loss 1.4447 LearningRate 0.0027 Epoch: 16 Global Step: 207930 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 19:46:05,502-Speed 3298.22 samples/sec Loss 1.3701 LearningRate 0.0027 Epoch: 16 Global Step: 207940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:46:08,635-Speed 3269.05 samples/sec Loss 1.3723 LearningRate 0.0027 Epoch: 16 Global Step: 207950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 19:46:11,710-Speed 3331.40 samples/sec Loss 1.3718 LearningRate 0.0027 Epoch: 16 Global Step: 207960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:46:14,842-Speed 3270.75 samples/sec Loss 1.4333 LearningRate 0.0027 Epoch: 16 Global Step: 207970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:46:17,954-Speed 3292.01 samples/sec Loss 1.4008 LearningRate 0.0027 Epoch: 16 Global Step: 207980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:46:21,051-Speed 3306.35 samples/sec Loss 1.3697 LearningRate 0.0026 Epoch: 16 Global Step: 207990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:46:24,150-Speed 3306.11 samples/sec Loss 1.3438 LearningRate 0.0026 Epoch: 16 Global Step: 208000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:46:27,254-Speed 3300.32 samples/sec Loss 1.3945 LearningRate 0.0026 Epoch: 16 Global Step: 208010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:46:30,384-Speed 3271.86 samples/sec Loss 1.3888 LearningRate 0.0026 Epoch: 16 Global Step: 208020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:46:33,450-Speed 3341.01 samples/sec Loss 1.4277 LearningRate 0.0026 Epoch: 16 Global Step: 208030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:46:36,527-Speed 3329.69 samples/sec Loss 1.4158 LearningRate 0.0026 Epoch: 16 Global Step: 208040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:46:39,586-Speed 3348.20 samples/sec Loss 1.3938 LearningRate 0.0026 Epoch: 16 Global Step: 208050 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:46:42,699-Speed 3291.21 samples/sec Loss 1.3369 LearningRate 0.0026 Epoch: 16 Global Step: 208060 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:46:45,758-Speed 3348.08 samples/sec Loss 1.3931 LearningRate 0.0026 Epoch: 16 Global Step: 208070 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:46:48,861-Speed 3300.89 samples/sec Loss 1.3386 LearningRate 0.0026 Epoch: 16 Global Step: 208080 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:46:51,978-Speed 3286.13 samples/sec Loss 1.3461 LearningRate 0.0026 Epoch: 16 Global Step: 208090 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:46:55,112-Speed 3268.30 samples/sec Loss 1.3083 LearningRate 0.0026 Epoch: 16 Global Step: 208100 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:46:58,170-Speed 3350.32 samples/sec Loss 1.3696 LearningRate 0.0026 Epoch: 16 Global Step: 208110 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:47:01,232-Speed 3344.75 samples/sec Loss 1.3425 LearningRate 0.0026 Epoch: 16 Global Step: 208120 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:47:04,334-Speed 3302.86 samples/sec Loss 1.3942 LearningRate 0.0026 Epoch: 16 Global Step: 208130 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:47:07,449-Speed 3287.71 samples/sec Loss 1.3582 LearningRate 0.0026 Epoch: 16 Global Step: 208140 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:47:10,604-Speed 3246.98 samples/sec Loss 1.3362 LearningRate 0.0026 Epoch: 16 Global Step: 208150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:13,660-Speed 3351.52 samples/sec Loss 1.3753 LearningRate 0.0026 Epoch: 16 Global Step: 208160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:16,804-Speed 3258.42 samples/sec Loss 1.3964 LearningRate 0.0026 Epoch: 16 Global Step: 208170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:19,904-Speed 3303.66 samples/sec Loss 1.3835 LearningRate 0.0026 Epoch: 16 Global Step: 208180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:23,005-Speed 3304.07 samples/sec Loss 1.3756 LearningRate 0.0026 Epoch: 16 Global Step: 208190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:26,151-Speed 3255.90 samples/sec Loss 1.3864 LearningRate 0.0026 Epoch: 16 Global Step: 208200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:29,328-Speed 3223.71 samples/sec Loss 1.4201 LearningRate 0.0026 Epoch: 16 Global Step: 208210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:32,426-Speed 3305.95 samples/sec Loss 1.3882 LearningRate 0.0026 Epoch: 16 Global Step: 208220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:35,516-Speed 3314.96 samples/sec Loss 1.4143 LearningRate 0.0026 Epoch: 16 Global Step: 208230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:38,625-Speed 3294.80 samples/sec Loss 1.3648 LearningRate 0.0026 Epoch: 16 Global Step: 208240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:41,695-Speed 3336.76 samples/sec Loss 1.3643 LearningRate 0.0026 Epoch: 16 Global Step: 208250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:47:44,809-Speed 3289.05 samples/sec Loss 1.3879 LearningRate 0.0026 Epoch: 16 Global Step: 208260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:47,973-Speed 3238.22 samples/sec Loss 1.3695 LearningRate 0.0026 Epoch: 16 Global Step: 208270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:51,140-Speed 3233.26 samples/sec Loss 1.3694 LearningRate 0.0026 Epoch: 16 Global Step: 208280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:54,241-Speed 3304.29 samples/sec Loss 1.3962 LearningRate 0.0026 Epoch: 16 Global Step: 208290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:47:57,328-Speed 3317.68 samples/sec Loss 1.4391 LearningRate 0.0026 Epoch: 16 Global Step: 208300 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:48:00,535-Speed 3194.28 samples/sec Loss 1.3106 LearningRate 0.0026 Epoch: 16 Global Step: 208310 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:48:03,616-Speed 3324.44 samples/sec Loss 1.4052 LearningRate 0.0026 Epoch: 16 Global Step: 208320 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:48:06,731-Speed 3288.31 samples/sec Loss 1.3866 LearningRate 0.0026 Epoch: 16 Global Step: 208330 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:48:09,803-Speed 3334.73 samples/sec Loss 1.3828 LearningRate 0.0026 Epoch: 16 Global Step: 208340 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:48:12,936-Speed 3270.01 samples/sec Loss 1.3225 LearningRate 0.0026 Epoch: 16 Global Step: 208350 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:48:16,028-Speed 3312.45 samples/sec Loss 1.3782 LearningRate 0.0026 Epoch: 16 Global Step: 208360 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:48:19,169-Speed 3261.57 samples/sec Loss 1.3321 LearningRate 0.0026 Epoch: 16 Global Step: 208370 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:48:22,233-Speed 3342.74 samples/sec Loss 1.3738 LearningRate 0.0026 Epoch: 16 Global Step: 208380 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:48:25,350-Speed 3286.42 samples/sec Loss 1.4254 LearningRate 0.0026 Epoch: 16 Global Step: 208390 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:48:28,572-Speed 3179.88 samples/sec Loss 1.3793 LearningRate 0.0026 Epoch: 16 Global Step: 208400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:48:31,709-Speed 3264.58 samples/sec Loss 1.3395 LearningRate 0.0026 Epoch: 16 Global Step: 208410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:48:34,857-Speed 3254.10 samples/sec Loss 1.3467 LearningRate 0.0026 Epoch: 16 Global Step: 208420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:48:37,983-Speed 3276.37 samples/sec Loss 1.4075 LearningRate 0.0026 Epoch: 16 Global Step: 208430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:48:41,070-Speed 3318.74 samples/sec Loss 1.3999 LearningRate 0.0026 Epoch: 16 Global Step: 208440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:48:44,189-Speed 3283.57 samples/sec Loss 1.3470 LearningRate 0.0026 Epoch: 16 Global Step: 208450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:48:47,268-Speed 3327.72 samples/sec Loss 1.3563 LearningRate 0.0026 Epoch: 16 Global Step: 208460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:48:50,463-Speed 3205.10 samples/sec Loss 1.3452 LearningRate 0.0026 Epoch: 16 Global Step: 208470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:48:53,544-Speed 3324.92 samples/sec Loss 1.3708 LearningRate 0.0026 Epoch: 16 Global Step: 208480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:48:56,594-Speed 3359.14 samples/sec Loss 1.4171 LearningRate 0.0026 Epoch: 16 Global Step: 208490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:48:59,659-Speed 3341.39 samples/sec Loss 1.3261 LearningRate 0.0026 Epoch: 16 Global Step: 208500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:49:02,695-Speed 3374.12 samples/sec Loss 1.3819 LearningRate 0.0026 Epoch: 16 Global Step: 208510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:05,770-Speed 3331.38 samples/sec Loss 1.3804 LearningRate 0.0026 Epoch: 16 Global Step: 208520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:08,856-Speed 3318.47 samples/sec Loss 1.4246 LearningRate 0.0026 Epoch: 16 Global Step: 208530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:12,041-Speed 3216.76 samples/sec Loss 1.4267 LearningRate 0.0026 Epoch: 16 Global Step: 208540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:15,121-Speed 3325.75 samples/sec Loss 1.4536 LearningRate 0.0026 Epoch: 16 Global Step: 208550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:18,194-Speed 3333.24 samples/sec Loss 1.3707 LearningRate 0.0026 Epoch: 16 Global Step: 208560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:21,289-Speed 3309.60 samples/sec Loss 1.3757 LearningRate 0.0026 Epoch: 16 Global Step: 208570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:24,470-Speed 3219.84 samples/sec Loss 1.4341 LearningRate 0.0026 Epoch: 16 Global Step: 208580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:27,622-Speed 3250.26 samples/sec Loss 1.3703 LearningRate 0.0026 Epoch: 16 Global Step: 208590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:30,699-Speed 3328.28 samples/sec Loss 1.3302 LearningRate 0.0026 Epoch: 16 Global Step: 208600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:33,776-Speed 3329.92 samples/sec Loss 1.3373 LearningRate 0.0026 Epoch: 16 Global Step: 208610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:49:36,863-Speed 3318.25 samples/sec Loss 1.3567 LearningRate 0.0026 Epoch: 16 Global Step: 208620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:49:39,941-Speed 3327.60 samples/sec Loss 1.4237 LearningRate 0.0026 Epoch: 16 Global Step: 208630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:43,087-Speed 3255.58 samples/sec Loss 1.3378 LearningRate 0.0026 Epoch: 16 Global Step: 208640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:46,189-Speed 3301.84 samples/sec Loss 1.3639 LearningRate 0.0026 Epoch: 16 Global Step: 208650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:49:49,320-Speed 3271.55 samples/sec Loss 1.3610 LearningRate 0.0026 Epoch: 16 Global Step: 208660 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:49:52,449-Speed 3273.86 samples/sec Loss 1.3432 LearningRate 0.0026 Epoch: 16 Global Step: 208670 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:49:55,589-Speed 3262.66 samples/sec Loss 1.3908 LearningRate 0.0026 Epoch: 16 Global Step: 208680 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:49:58,717-Speed 3274.22 samples/sec Loss 1.3595 LearningRate 0.0026 Epoch: 16 Global Step: 208690 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:50:01,805-Speed 3317.08 samples/sec Loss 1.4001 LearningRate 0.0026 Epoch: 16 Global Step: 208700 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:50:04,871-Speed 3340.83 samples/sec Loss 1.3581 LearningRate 0.0026 Epoch: 16 Global Step: 208710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:50:07,941-Speed 3336.72 samples/sec Loss 1.3966 LearningRate 0.0026 Epoch: 16 Global Step: 208720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:50:11,008-Speed 3339.92 samples/sec Loss 1.3942 LearningRate 0.0026 Epoch: 16 Global Step: 208730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:50:14,156-Speed 3253.78 samples/sec Loss 1.3262 LearningRate 0.0026 Epoch: 16 Global Step: 208740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:50:17,310-Speed 3248.20 samples/sec Loss 1.3894 LearningRate 0.0026 Epoch: 16 Global Step: 208750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:50:20,373-Speed 3344.42 samples/sec Loss 1.3674 LearningRate 0.0025 Epoch: 16 Global Step: 208760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:50:23,550-Speed 3224.57 samples/sec Loss 1.3860 LearningRate 0.0025 Epoch: 16 Global Step: 208770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:50:26,676-Speed 3276.39 samples/sec Loss 1.3781 LearningRate 0.0025 Epoch: 16 Global Step: 208780 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:50:29,734-Speed 3350.48 samples/sec Loss 1.3401 LearningRate 0.0025 Epoch: 16 Global Step: 208790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:50:32,786-Speed 3355.66 samples/sec Loss 1.4181 LearningRate 0.0025 Epoch: 16 Global Step: 208800 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:50:35,856-Speed 3336.82 samples/sec Loss 1.3762 LearningRate 0.0025 Epoch: 16 Global Step: 208810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:50:38,949-Speed 3311.79 samples/sec Loss 1.4054 LearningRate 0.0025 Epoch: 16 Global Step: 208820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:50:42,069-Speed 3282.48 samples/sec Loss 1.4028 LearningRate 0.0025 Epoch: 16 Global Step: 208830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:50:45,123-Speed 3354.24 samples/sec Loss 1.3744 LearningRate 0.0025 Epoch: 16 Global Step: 208840 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:50:48,185-Speed 3345.42 samples/sec Loss 1.3279 LearningRate 0.0025 Epoch: 16 Global Step: 208850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:50:51,334-Speed 3252.87 samples/sec Loss 1.4134 LearningRate 0.0025 Epoch: 16 Global Step: 208860 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:50:54,473-Speed 3262.99 samples/sec Loss 1.4125 LearningRate 0.0025 Epoch: 16 Global Step: 208870 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:50:57,565-Speed 3312.52 samples/sec Loss 1.3895 LearningRate 0.0025 Epoch: 16 Global Step: 208880 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:00,694-Speed 3274.21 samples/sec Loss 1.3418 LearningRate 0.0025 Epoch: 16 Global Step: 208890 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:03,825-Speed 3271.89 samples/sec Loss 1.3946 LearningRate 0.0025 Epoch: 16 Global Step: 208900 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:06,909-Speed 3321.45 samples/sec Loss 1.3280 LearningRate 0.0025 Epoch: 16 Global Step: 208910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:51:09,952-Speed 3365.43 samples/sec Loss 1.3498 LearningRate 0.0025 Epoch: 16 Global Step: 208920 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:13,081-Speed 3274.68 samples/sec Loss 1.3034 LearningRate 0.0025 Epoch: 16 Global Step: 208930 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:16,204-Speed 3280.12 samples/sec Loss 1.3898 LearningRate 0.0025 Epoch: 16 Global Step: 208940 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:19,296-Speed 3312.22 samples/sec Loss 1.3972 LearningRate 0.0025 Epoch: 16 Global Step: 208950 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:22,360-Speed 3342.81 samples/sec Loss 1.3949 LearningRate 0.0025 Epoch: 16 Global Step: 208960 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:25,644-Speed 3119.95 samples/sec Loss 1.3627 LearningRate 0.0025 Epoch: 16 Global Step: 208970 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:28,837-Speed 3207.67 samples/sec Loss 1.3972 LearningRate 0.0025 Epoch: 16 Global Step: 208980 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:31,978-Speed 3261.42 samples/sec Loss 1.3949 LearningRate 0.0025 Epoch: 16 Global Step: 208990 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:35,147-Speed 3232.34 samples/sec Loss 1.3767 LearningRate 0.0025 Epoch: 16 Global Step: 209000 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:38,258-Speed 3292.96 samples/sec Loss 1.3744 LearningRate 0.0025 Epoch: 16 Global Step: 209010 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:51:41,425-Speed 3234.49 samples/sec Loss 1.4114 LearningRate 0.0025 Epoch: 16 Global Step: 209020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:51:44,570-Speed 3257.31 samples/sec Loss 1.3775 LearningRate 0.0025 Epoch: 16 Global Step: 209030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:51:47,713-Speed 3258.46 samples/sec Loss 1.3232 LearningRate 0.0025 Epoch: 16 Global Step: 209040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:51:50,850-Speed 3265.51 samples/sec Loss 1.3702 LearningRate 0.0025 Epoch: 16 Global Step: 209050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:51:54,018-Speed 3233.39 samples/sec Loss 1.3829 LearningRate 0.0025 Epoch: 16 Global Step: 209060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:51:57,091-Speed 3333.27 samples/sec Loss 1.3880 LearningRate 0.0025 Epoch: 16 Global Step: 209070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:00,217-Speed 3276.35 samples/sec Loss 1.4159 LearningRate 0.0025 Epoch: 16 Global Step: 209080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:03,364-Speed 3255.37 samples/sec Loss 1.3764 LearningRate 0.0025 Epoch: 16 Global Step: 209090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:06,558-Speed 3206.39 samples/sec Loss 1.3406 LearningRate 0.0025 Epoch: 16 Global Step: 209100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:09,645-Speed 3318.75 samples/sec Loss 1.3390 LearningRate 0.0025 Epoch: 16 Global Step: 209110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:12,769-Speed 3278.80 samples/sec Loss 1.3829 LearningRate 0.0025 Epoch: 16 Global Step: 209120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:52:15,924-Speed 3246.87 samples/sec Loss 1.3516 LearningRate 0.0025 Epoch: 16 Global Step: 209130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:52:19,080-Speed 3245.63 samples/sec Loss 1.3536 LearningRate 0.0025 Epoch: 16 Global Step: 209140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:52:22,202-Speed 3281.07 samples/sec Loss 1.4125 LearningRate 0.0025 Epoch: 16 Global Step: 209150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:52:25,345-Speed 3258.77 samples/sec Loss 1.3747 LearningRate 0.0025 Epoch: 16 Global Step: 209160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:52:28,423-Speed 3327.51 samples/sec Loss 1.3885 LearningRate 0.0025 Epoch: 16 Global Step: 209170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:52:31,518-Speed 3309.89 samples/sec Loss 1.3711 LearningRate 0.0025 Epoch: 16 Global Step: 209180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:52:34,605-Speed 3318.83 samples/sec Loss 1.3910 LearningRate 0.0025 Epoch: 16 Global Step: 209190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:37,718-Speed 3290.36 samples/sec Loss 1.3430 LearningRate 0.0025 Epoch: 16 Global Step: 209200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:40,846-Speed 3274.27 samples/sec Loss 1.4279 LearningRate 0.0025 Epoch: 16 Global Step: 209210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:43,958-Speed 3291.93 samples/sec Loss 1.3243 LearningRate 0.0025 Epoch: 16 Global Step: 209220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:47,052-Speed 3310.98 samples/sec Loss 1.3644 LearningRate 0.0025 Epoch: 16 Global Step: 209230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:50,150-Speed 3305.78 samples/sec Loss 1.3893 LearningRate 0.0025 Epoch: 16 Global Step: 209240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:53,273-Speed 3280.01 samples/sec Loss 1.3852 LearningRate 0.0025 Epoch: 16 Global Step: 209250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:56,393-Speed 3283.64 samples/sec Loss 1.3490 LearningRate 0.0025 Epoch: 16 Global Step: 209260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:52:59,546-Speed 3248.57 samples/sec Loss 1.3922 LearningRate 0.0025 Epoch: 16 Global Step: 209270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:02,664-Speed 3285.42 samples/sec Loss 1.3950 LearningRate 0.0025 Epoch: 16 Global Step: 209280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:05,850-Speed 3214.69 samples/sec Loss 1.3013 LearningRate 0.0025 Epoch: 16 Global Step: 209290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:08,967-Speed 3287.35 samples/sec Loss 1.3566 LearningRate 0.0025 Epoch: 16 Global Step: 209300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:12,055-Speed 3316.82 samples/sec Loss 1.3652 LearningRate 0.0025 Epoch: 16 Global Step: 209310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:15,174-Speed 3284.65 samples/sec Loss 1.3543 LearningRate 0.0025 Epoch: 16 Global Step: 209320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:18,253-Speed 3325.98 samples/sec Loss 1.3652 LearningRate 0.0025 Epoch: 16 Global Step: 209330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:21,362-Speed 3294.84 samples/sec Loss 1.3633 LearningRate 0.0025 Epoch: 16 Global Step: 209340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:24,466-Speed 3300.55 samples/sec Loss 1.3784 LearningRate 0.0025 Epoch: 16 Global Step: 209350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:27,574-Speed 3295.15 samples/sec Loss 1.3172 LearningRate 0.0025 Epoch: 16 Global Step: 209360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:30,698-Speed 3278.90 samples/sec Loss 1.3579 LearningRate 0.0025 Epoch: 16 Global Step: 209370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:33,769-Speed 3335.55 samples/sec Loss 1.4175 LearningRate 0.0025 Epoch: 16 Global Step: 209380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:36,911-Speed 3259.86 samples/sec Loss 1.3082 LearningRate 0.0025 Epoch: 16 Global Step: 209390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:53:40,063-Speed 3249.82 samples/sec Loss 1.3856 LearningRate 0.0025 Epoch: 16 Global Step: 209400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:53:43,176-Speed 3290.38 samples/sec Loss 1.3790 LearningRate 0.0025 Epoch: 16 Global Step: 209410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:53:46,256-Speed 3325.75 samples/sec Loss 1.3658 LearningRate 0.0025 Epoch: 16 Global Step: 209420 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:53:49,348-Speed 3312.93 samples/sec Loss 1.4021 LearningRate 0.0025 Epoch: 16 Global Step: 209430 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:53:52,544-Speed 3205.69 samples/sec Loss 1.3930 LearningRate 0.0025 Epoch: 16 Global Step: 209440 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:53:55,611-Speed 3339.30 samples/sec Loss 1.3685 LearningRate 0.0025 Epoch: 16 Global Step: 209450 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:53:58,707-Speed 3308.09 samples/sec Loss 1.3341 LearningRate 0.0025 Epoch: 16 Global Step: 209460 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:01,798-Speed 3314.25 samples/sec Loss 1.3956 LearningRate 0.0025 Epoch: 16 Global Step: 209470 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:04,870-Speed 3335.29 samples/sec Loss 1.3493 LearningRate 0.0025 Epoch: 16 Global Step: 209480 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:07,977-Speed 3296.80 samples/sec Loss 1.3693 LearningRate 0.0025 Epoch: 16 Global Step: 209490 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:11,080-Speed 3301.30 samples/sec Loss 1.3683 LearningRate 0.0025 Epoch: 16 Global Step: 209500 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:14,153-Speed 3332.98 samples/sec Loss 1.4294 LearningRate 0.0025 Epoch: 16 Global Step: 209510 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:17,318-Speed 3236.58 samples/sec Loss 1.3502 LearningRate 0.0025 Epoch: 16 Global Step: 209520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:54:20,359-Speed 3368.31 samples/sec Loss 1.3576 LearningRate 0.0025 Epoch: 16 Global Step: 209530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:54:23,556-Speed 3204.93 samples/sec Loss 1.3697 LearningRate 0.0024 Epoch: 16 Global Step: 209540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:54:26,717-Speed 3240.35 samples/sec Loss 1.3241 LearningRate 0.0024 Epoch: 16 Global Step: 209550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:54:29,805-Speed 3316.45 samples/sec Loss 1.4060 LearningRate 0.0024 Epoch: 16 Global Step: 209560 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:32,906-Speed 3304.10 samples/sec Loss 1.4041 LearningRate 0.0024 Epoch: 16 Global Step: 209570 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:35,969-Speed 3343.53 samples/sec Loss 1.3713 LearningRate 0.0024 Epoch: 16 Global Step: 209580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:39,075-Speed 3297.85 samples/sec Loss 1.3272 LearningRate 0.0024 Epoch: 16 Global Step: 209590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:42,211-Speed 3266.96 samples/sec Loss 1.3711 LearningRate 0.0024 Epoch: 16 Global Step: 209600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:45,325-Speed 3288.76 samples/sec Loss 1.3668 LearningRate 0.0024 Epoch: 16 Global Step: 209610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:48,417-Speed 3312.52 samples/sec Loss 1.3827 LearningRate 0.0024 Epoch: 16 Global Step: 209620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:51,515-Speed 3306.82 samples/sec Loss 1.3273 LearningRate 0.0024 Epoch: 16 Global Step: 209630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:54,605-Speed 3314.89 samples/sec Loss 1.3606 LearningRate 0.0024 Epoch: 16 Global Step: 209640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:54:57,660-Speed 3352.83 samples/sec Loss 1.3771 LearningRate 0.0024 Epoch: 16 Global Step: 209650 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:55:00,735-Speed 3332.33 samples/sec Loss 1.3680 LearningRate 0.0024 Epoch: 16 Global Step: 209660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:55:03,831-Speed 3308.42 samples/sec Loss 1.4440 LearningRate 0.0024 Epoch: 16 Global Step: 209670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:55:06,965-Speed 3268.20 samples/sec Loss 1.3508 LearningRate 0.0024 Epoch: 16 Global Step: 209680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:55:10,049-Speed 3321.93 samples/sec Loss 1.3977 LearningRate 0.0024 Epoch: 16 Global Step: 209690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:55:13,188-Speed 3262.30 samples/sec Loss 1.3448 LearningRate 0.0024 Epoch: 16 Global Step: 209700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:55:16,302-Speed 3289.90 samples/sec Loss 1.3695 LearningRate 0.0024 Epoch: 16 Global Step: 209710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:55:19,411-Speed 3294.47 samples/sec Loss 1.3372 LearningRate 0.0024 Epoch: 16 Global Step: 209720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:55:22,492-Speed 3324.55 samples/sec Loss 1.3649 LearningRate 0.0024 Epoch: 16 Global Step: 209730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:55:25,555-Speed 3344.34 samples/sec Loss 1.3404 LearningRate 0.0024 Epoch: 16 Global Step: 209740 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:55:28,607-Speed 3356.97 samples/sec Loss 1.3278 LearningRate 0.0024 Epoch: 16 Global Step: 209750 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:55:31,656-Speed 3358.63 samples/sec Loss 1.3745 LearningRate 0.0024 Epoch: 16 Global Step: 209760 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:55:34,703-Speed 3362.19 samples/sec Loss 1.3930 LearningRate 0.0024 Epoch: 16 Global Step: 209770 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:55:37,867-Speed 3238.07 samples/sec Loss 1.3916 LearningRate 0.0024 Epoch: 16 Global Step: 209780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:55:41,055-Speed 3212.31 samples/sec Loss 1.3867 LearningRate 0.0024 Epoch: 16 Global Step: 209790 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:55:44,146-Speed 3313.98 samples/sec Loss 1.3550 LearningRate 0.0024 Epoch: 16 Global Step: 209800 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:55:47,253-Speed 3297.53 samples/sec Loss 1.3886 LearningRate 0.0024 Epoch: 16 Global Step: 209810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:55:50,391-Speed 3264.05 samples/sec Loss 1.3708 LearningRate 0.0024 Epoch: 16 Global Step: 209820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:55:53,504-Speed 3290.57 samples/sec Loss 1.3675 LearningRate 0.0024 Epoch: 16 Global Step: 209830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:55:56,642-Speed 3264.12 samples/sec Loss 1.3306 LearningRate 0.0024 Epoch: 16 Global Step: 209840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:55:59,733-Speed 3314.02 samples/sec Loss 1.3721 LearningRate 0.0024 Epoch: 16 Global Step: 209850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:56:02,827-Speed 3310.68 samples/sec Loss 1.3850 LearningRate 0.0024 Epoch: 16 Global Step: 209860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:56:05,959-Speed 3269.85 samples/sec Loss 1.3107 LearningRate 0.0024 Epoch: 16 Global Step: 209870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:56:09,077-Speed 3285.68 samples/sec Loss 1.3686 LearningRate 0.0024 Epoch: 16 Global Step: 209880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:56:12,175-Speed 3305.84 samples/sec Loss 1.3925 LearningRate 0.0024 Epoch: 16 Global Step: 209890 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:56:15,290-Speed 3288.42 samples/sec Loss 1.3879 LearningRate 0.0024 Epoch: 16 Global Step: 209900 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:56:18,400-Speed 3293.80 samples/sec Loss 1.3374 LearningRate 0.0024 Epoch: 16 Global Step: 209910 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:56:21,475-Speed 3331.46 samples/sec Loss 1.3876 LearningRate 0.0024 Epoch: 16 Global Step: 209920 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:56:24,614-Speed 3262.73 samples/sec Loss 1.3667 LearningRate 0.0024 Epoch: 16 Global Step: 209930 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:56:27,820-Speed 3195.63 samples/sec Loss 1.3676 LearningRate 0.0024 Epoch: 16 Global Step: 209940 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:56:30,905-Speed 3319.70 samples/sec Loss 1.3676 LearningRate 0.0024 Epoch: 16 Global Step: 209950 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:56:33,969-Speed 3342.95 samples/sec Loss 1.4659 LearningRate 0.0024 Epoch: 16 Global Step: 209960 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:56:37,079-Speed 3293.60 samples/sec Loss 1.3709 LearningRate 0.0024 Epoch: 16 Global Step: 209970 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:56:40,250-Speed 3230.81 samples/sec Loss 1.3512 LearningRate 0.0024 Epoch: 16 Global Step: 209980 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:56:43,452-Speed 3199.73 samples/sec Loss 1.3644 LearningRate 0.0024 Epoch: 16 Global Step: 209990 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:56:46,546-Speed 3310.69 samples/sec Loss 1.3936 LearningRate 0.0024 Epoch: 16 Global Step: 210000 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:56:49,647-Speed 3303.27 samples/sec Loss 1.3052 LearningRate 0.0024 Epoch: 16 Global Step: 210010 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:56:52,827-Speed 3221.02 samples/sec Loss 1.4020 LearningRate 0.0024 Epoch: 16 Global Step: 210020 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:56:55,897-Speed 3336.15 samples/sec Loss 1.3936 LearningRate 0.0024 Epoch: 16 Global Step: 210030 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:56:58,971-Speed 3332.54 samples/sec Loss 1.3728 LearningRate 0.0024 Epoch: 16 Global Step: 210040 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:57:02,090-Speed 3284.15 samples/sec Loss 1.3725 LearningRate 0.0024 Epoch: 16 Global Step: 210050 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:57:05,185-Speed 3309.39 samples/sec Loss 1.4073 LearningRate 0.0024 Epoch: 16 Global Step: 210060 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 19:57:08,239-Speed 3355.10 samples/sec Loss 1.3350 LearningRate 0.0024 Epoch: 16 Global Step: 210070 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:57:11,328-Speed 3315.62 samples/sec Loss 1.3330 LearningRate 0.0024 Epoch: 16 Global Step: 210080 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:57:14,448-Speed 3283.19 samples/sec Loss 1.4138 LearningRate 0.0024 Epoch: 16 Global Step: 210090 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:57:17,563-Speed 3288.23 samples/sec Loss 1.3337 LearningRate 0.0024 Epoch: 16 Global Step: 210100 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:57:20,660-Speed 3307.53 samples/sec Loss 1.3420 LearningRate 0.0024 Epoch: 16 Global Step: 210110 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:57:23,783-Speed 3279.82 samples/sec Loss 1.3548 LearningRate 0.0024 Epoch: 16 Global Step: 210120 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:57:26,937-Speed 3248.24 samples/sec Loss 1.3355 LearningRate 0.0024 Epoch: 16 Global Step: 210130 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:57:30,070-Speed 3268.96 samples/sec Loss 1.4200 LearningRate 0.0024 Epoch: 16 Global Step: 210140 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:57:33,146-Speed 3330.60 samples/sec Loss 1.3328 LearningRate 0.0024 Epoch: 16 Global Step: 210150 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:57:36,329-Speed 3218.16 samples/sec Loss 1.3518 LearningRate 0.0024 Epoch: 16 Global Step: 210160 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:57:39,429-Speed 3304.68 samples/sec Loss 1.3511 LearningRate 0.0024 Epoch: 16 Global Step: 210170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:57:42,555-Speed 3276.40 samples/sec Loss 1.3583 LearningRate 0.0024 Epoch: 16 Global Step: 210180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:57:45,630-Speed 3331.84 samples/sec Loss 1.3645 LearningRate 0.0024 Epoch: 16 Global Step: 210190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:57:48,755-Speed 3277.70 samples/sec Loss 1.3720 LearningRate 0.0024 Epoch: 16 Global Step: 210200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:57:51,852-Speed 3307.30 samples/sec Loss 1.3622 LearningRate 0.0024 Epoch: 16 Global Step: 210210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:57:54,921-Speed 3337.77 samples/sec Loss 1.3444 LearningRate 0.0024 Epoch: 16 Global Step: 210220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:57:57,983-Speed 3345.73 samples/sec Loss 1.3682 LearningRate 0.0024 Epoch: 16 Global Step: 210230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:01,108-Speed 3277.49 samples/sec Loss 1.3852 LearningRate 0.0024 Epoch: 16 Global Step: 210240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:04,202-Speed 3310.53 samples/sec Loss 1.3469 LearningRate 0.0024 Epoch: 16 Global Step: 210250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:07,397-Speed 3206.36 samples/sec Loss 1.3814 LearningRate 0.0024 Epoch: 16 Global Step: 210260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:10,540-Speed 3259.66 samples/sec Loss 1.4169 LearningRate 0.0024 Epoch: 16 Global Step: 210270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:58:13,627-Speed 3317.48 samples/sec Loss 1.3488 LearningRate 0.0024 Epoch: 16 Global Step: 210280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:58:16,765-Speed 3264.55 samples/sec Loss 1.3277 LearningRate 0.0024 Epoch: 16 Global Step: 210290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:58:19,841-Speed 3329.60 samples/sec Loss 1.3432 LearningRate 0.0024 Epoch: 16 Global Step: 210300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:22,938-Speed 3308.36 samples/sec Loss 1.3687 LearningRate 0.0024 Epoch: 16 Global Step: 210310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:26,041-Speed 3300.65 samples/sec Loss 1.3751 LearningRate 0.0024 Epoch: 16 Global Step: 210320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:29,126-Speed 3320.79 samples/sec Loss 1.3405 LearningRate 0.0024 Epoch: 16 Global Step: 210330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:32,200-Speed 3332.04 samples/sec Loss 1.3770 LearningRate 0.0023 Epoch: 16 Global Step: 210340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:35,311-Speed 3292.31 samples/sec Loss 1.3989 LearningRate 0.0023 Epoch: 16 Global Step: 210350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:38,431-Speed 3283.85 samples/sec Loss 1.3624 LearningRate 0.0023 Epoch: 16 Global Step: 210360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:41,574-Speed 3258.87 samples/sec Loss 1.3951 LearningRate 0.0023 Epoch: 16 Global Step: 210370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:44,642-Speed 3338.62 samples/sec Loss 1.3912 LearningRate 0.0023 Epoch: 16 Global Step: 210380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:47,791-Speed 3253.54 samples/sec Loss 1.3165 LearningRate 0.0023 Epoch: 16 Global Step: 210390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:58:50,954-Speed 3238.02 samples/sec Loss 1.3701 LearningRate 0.0023 Epoch: 16 Global Step: 210400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:58:54,159-Speed 3195.89 samples/sec Loss 1.3594 LearningRate 0.0023 Epoch: 16 Global Step: 210410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:58:57,244-Speed 3320.29 samples/sec Loss 1.3168 LearningRate 0.0023 Epoch: 16 Global Step: 210420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:59:00,374-Speed 3272.88 samples/sec Loss 1.3800 LearningRate 0.0023 Epoch: 16 Global Step: 210430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:59:03,518-Speed 3257.75 samples/sec Loss 1.3555 LearningRate 0.0023 Epoch: 16 Global Step: 210440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:59:06,634-Speed 3287.86 samples/sec Loss 1.3699 LearningRate 0.0023 Epoch: 16 Global Step: 210450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:59:09,704-Speed 3336.57 samples/sec Loss 1.4021 LearningRate 0.0023 Epoch: 16 Global Step: 210460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:59:12,773-Speed 3337.06 samples/sec Loss 1.3737 LearningRate 0.0023 Epoch: 16 Global Step: 210470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:59:15,861-Speed 3317.96 samples/sec Loss 1.3799 LearningRate 0.0023 Epoch: 16 Global Step: 210480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 19:59:19,045-Speed 3217.26 samples/sec Loss 1.3499 LearningRate 0.0023 Epoch: 16 Global Step: 210490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:59:22,126-Speed 3324.08 samples/sec Loss 1.3661 LearningRate 0.0023 Epoch: 16 Global Step: 210500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:59:25,232-Speed 3298.43 samples/sec Loss 1.3825 LearningRate 0.0023 Epoch: 16 Global Step: 210510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:59:28,316-Speed 3321.35 samples/sec Loss 1.3754 LearningRate 0.0023 Epoch: 16 Global Step: 210520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:59:31,414-Speed 3306.81 samples/sec Loss 1.3725 LearningRate 0.0023 Epoch: 16 Global Step: 210530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:59:34,515-Speed 3302.36 samples/sec Loss 1.3440 LearningRate 0.0023 Epoch: 16 Global Step: 210540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:59:37,610-Speed 3309.53 samples/sec Loss 1.3238 LearningRate 0.0023 Epoch: 16 Global Step: 210550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:59:40,740-Speed 3272.99 samples/sec Loss 1.3375 LearningRate 0.0023 Epoch: 16 Global Step: 210560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:59:43,813-Speed 3333.35 samples/sec Loss 1.3967 LearningRate 0.0023 Epoch: 16 Global Step: 210570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 19:59:46,892-Speed 3326.68 samples/sec Loss 1.3672 LearningRate 0.0023 Epoch: 16 Global Step: 210580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:59:49,990-Speed 3306.30 samples/sec Loss 1.3558 LearningRate 0.0023 Epoch: 16 Global Step: 210590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:59:53,103-Speed 3291.06 samples/sec Loss 1.3251 LearningRate 0.0023 Epoch: 16 Global Step: 210600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:59:56,228-Speed 3278.29 samples/sec Loss 1.4159 LearningRate 0.0023 Epoch: 16 Global Step: 210610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 19:59:59,305-Speed 3329.01 samples/sec Loss 1.3664 LearningRate 0.0023 Epoch: 16 Global Step: 210620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:02,395-Speed 3314.31 samples/sec Loss 1.3719 LearningRate 0.0023 Epoch: 16 Global Step: 210630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:05,464-Speed 3338.51 samples/sec Loss 1.3266 LearningRate 0.0023 Epoch: 16 Global Step: 210640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:08,536-Speed 3334.21 samples/sec Loss 1.3791 LearningRate 0.0023 Epoch: 16 Global Step: 210650 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:11,625-Speed 3316.06 samples/sec Loss 1.3600 LearningRate 0.0023 Epoch: 16 Global Step: 210660 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:14,785-Speed 3242.05 samples/sec Loss 1.3242 LearningRate 0.0023 Epoch: 16 Global Step: 210670 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:17,893-Speed 3295.87 samples/sec Loss 1.3707 LearningRate 0.0023 Epoch: 16 Global Step: 210680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:00:20,947-Speed 3353.87 samples/sec Loss 1.4555 LearningRate 0.0023 Epoch: 16 Global Step: 210690 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:24,075-Speed 3275.06 samples/sec Loss 1.3253 LearningRate 0.0023 Epoch: 16 Global Step: 210700 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:27,205-Speed 3272.03 samples/sec Loss 1.3417 LearningRate 0.0023 Epoch: 16 Global Step: 210710 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:30,333-Speed 3275.34 samples/sec Loss 1.3694 LearningRate 0.0023 Epoch: 16 Global Step: 210720 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:33,416-Speed 3321.97 samples/sec Loss 1.3316 LearningRate 0.0023 Epoch: 16 Global Step: 210730 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:36,543-Speed 3275.70 samples/sec Loss 1.3456 LearningRate 0.0023 Epoch: 16 Global Step: 210740 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:39,652-Speed 3295.38 samples/sec Loss 1.3584 LearningRate 0.0023 Epoch: 16 Global Step: 210750 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:42,787-Speed 3266.74 samples/sec Loss 1.3476 LearningRate 0.0023 Epoch: 16 Global Step: 210760 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:45,855-Speed 3339.12 samples/sec Loss 1.3394 LearningRate 0.0023 Epoch: 16 Global Step: 210770 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:48,952-Speed 3307.06 samples/sec Loss 1.4037 LearningRate 0.0023 Epoch: 16 Global Step: 210780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:00:52,115-Speed 3239.31 samples/sec Loss 1.4054 LearningRate 0.0023 Epoch: 16 Global Step: 210790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:00:55,198-Speed 3322.37 samples/sec Loss 1.3253 LearningRate 0.0023 Epoch: 16 Global Step: 210800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:00:58,287-Speed 3315.96 samples/sec Loss 1.3554 LearningRate 0.0023 Epoch: 16 Global Step: 210810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:01,420-Speed 3270.32 samples/sec Loss 1.3547 LearningRate 0.0023 Epoch: 16 Global Step: 210820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:04,542-Speed 3280.08 samples/sec Loss 1.3331 LearningRate 0.0023 Epoch: 16 Global Step: 210830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:07,650-Speed 3296.15 samples/sec Loss 1.4020 LearningRate 0.0023 Epoch: 16 Global Step: 210840 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:10,763-Speed 3290.62 samples/sec Loss 1.3626 LearningRate 0.0023 Epoch: 16 Global Step: 210850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:13,952-Speed 3212.00 samples/sec Loss 1.3596 LearningRate 0.0023 Epoch: 16 Global Step: 210860 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:17,112-Speed 3241.91 samples/sec Loss 1.3600 LearningRate 0.0023 Epoch: 16 Global Step: 210870 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:20,199-Speed 3318.25 samples/sec Loss 1.3861 LearningRate 0.0023 Epoch: 16 Global Step: 210880 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:23,275-Speed 3329.21 samples/sec Loss 1.3400 LearningRate 0.0023 Epoch: 16 Global Step: 210890 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:26,406-Speed 3271.80 samples/sec Loss 1.3597 LearningRate 0.0023 Epoch: 16 Global Step: 210900 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:29,502-Speed 3308.34 samples/sec Loss 1.3753 LearningRate 0.0023 Epoch: 16 Global Step: 210910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:01:32,595-Speed 3311.62 samples/sec Loss 1.3463 LearningRate 0.0023 Epoch: 16 Global Step: 210920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:01:35,689-Speed 3310.48 samples/sec Loss 1.3811 LearningRate 0.0023 Epoch: 16 Global Step: 210930 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:38,847-Speed 3244.44 samples/sec Loss 1.3921 LearningRate 0.0023 Epoch: 16 Global Step: 210940 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:42,017-Speed 3230.91 samples/sec Loss 1.3692 LearningRate 0.0023 Epoch: 16 Global Step: 210950 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:45,109-Speed 3312.49 samples/sec Loss 1.3658 LearningRate 0.0023 Epoch: 16 Global Step: 210960 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:48,309-Speed 3201.23 samples/sec Loss 1.4011 LearningRate 0.0023 Epoch: 16 Global Step: 210970 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:51,409-Speed 3304.99 samples/sec Loss 1.3509 LearningRate 0.0023 Epoch: 16 Global Step: 210980 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:54,553-Speed 3257.49 samples/sec Loss 1.3598 LearningRate 0.0023 Epoch: 16 Global Step: 210990 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:01:57,624-Speed 3335.68 samples/sec Loss 1.3043 LearningRate 0.0023 Epoch: 16 Global Step: 211000 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:00,777-Speed 3248.91 samples/sec Loss 1.4073 LearningRate 0.0023 Epoch: 16 Global Step: 211010 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:03,866-Speed 3315.34 samples/sec Loss 1.3871 LearningRate 0.0023 Epoch: 16 Global Step: 211020 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:06,959-Speed 3311.93 samples/sec Loss 1.3290 LearningRate 0.0023 Epoch: 16 Global Step: 211030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:02:10,053-Speed 3310.70 samples/sec Loss 1.4145 LearningRate 0.0023 Epoch: 16 Global Step: 211040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:02:13,181-Speed 3274.36 samples/sec Loss 1.3689 LearningRate 0.0023 Epoch: 16 Global Step: 211050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:02:16,306-Speed 3278.63 samples/sec Loss 1.3302 LearningRate 0.0023 Epoch: 16 Global Step: 211060 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:19,457-Speed 3250.61 samples/sec Loss 1.3197 LearningRate 0.0023 Epoch: 16 Global Step: 211070 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:22,548-Speed 3313.30 samples/sec Loss 1.3649 LearningRate 0.0023 Epoch: 16 Global Step: 211080 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:25,671-Speed 3280.08 samples/sec Loss 1.4359 LearningRate 0.0023 Epoch: 16 Global Step: 211090 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:28,796-Speed 3278.10 samples/sec Loss 1.3681 LearningRate 0.0023 Epoch: 16 Global Step: 211100 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:31,946-Speed 3251.51 samples/sec Loss 1.3724 LearningRate 0.0023 Epoch: 16 Global Step: 211110 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:35,636-Speed 2775.96 samples/sec Loss 1.3336 LearningRate 0.0023 Epoch: 16 Global Step: 211120 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:38,801-Speed 3235.71 samples/sec Loss 1.3859 LearningRate 0.0023 Epoch: 16 Global Step: 211130 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:41,963-Speed 3239.51 samples/sec Loss 1.3713 LearningRate 0.0023 Epoch: 16 Global Step: 211140 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:02:45,223-Speed 3142.95 samples/sec Loss 1.4018 LearningRate 0.0023 Epoch: 16 Global Step: 211150 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:03:17,341-Speed 318.83 samples/sec Loss 1.2997 LearningRate 0.0022 Epoch: 17 Global Step: 211160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:03:20,538-Speed 3205.46 samples/sec Loss 1.0522 LearningRate 0.0022 Epoch: 17 Global Step: 211170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:03:24,049-Speed 2916.82 samples/sec Loss 1.0520 LearningRate 0.0022 Epoch: 17 Global Step: 211180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:03:27,150-Speed 3303.79 samples/sec Loss 1.1083 LearningRate 0.0022 Epoch: 17 Global Step: 211190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:03:30,239-Speed 3315.99 samples/sec Loss 1.0343 LearningRate 0.0022 Epoch: 17 Global Step: 211200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:03:33,298-Speed 3348.58 samples/sec Loss 1.0310 LearningRate 0.0022 Epoch: 17 Global Step: 211210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:03:36,425-Speed 3276.46 samples/sec Loss 1.0416 LearningRate 0.0022 Epoch: 17 Global Step: 211220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:03:39,564-Speed 3262.47 samples/sec Loss 1.0024 LearningRate 0.0022 Epoch: 17 Global Step: 211230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:03:42,679-Speed 3289.19 samples/sec Loss 1.0339 LearningRate 0.0022 Epoch: 17 Global Step: 211240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:03:45,736-Speed 3350.67 samples/sec Loss 1.0568 LearningRate 0.0022 Epoch: 17 Global Step: 211250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:03:48,831-Speed 3309.21 samples/sec Loss 1.0421 LearningRate 0.0022 Epoch: 17 Global Step: 211260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:03:51,969-Speed 3264.38 samples/sec Loss 0.9979 LearningRate 0.0022 Epoch: 17 Global Step: 211270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:03:55,244-Speed 3127.19 samples/sec Loss 1.0100 LearningRate 0.0022 Epoch: 17 Global Step: 211280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:03:58,280-Speed 3374.63 samples/sec Loss 0.9908 LearningRate 0.0022 Epoch: 17 Global Step: 211290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:01,439-Speed 3242.16 samples/sec Loss 0.9972 LearningRate 0.0022 Epoch: 17 Global Step: 211300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:04,586-Speed 3254.51 samples/sec Loss 1.0896 LearningRate 0.0022 Epoch: 17 Global Step: 211310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:07,737-Speed 3251.18 samples/sec Loss 1.0100 LearningRate 0.0022 Epoch: 17 Global Step: 211320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:10,808-Speed 3335.67 samples/sec Loss 1.0365 LearningRate 0.0022 Epoch: 17 Global Step: 211330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:13,964-Speed 3245.48 samples/sec Loss 1.0561 LearningRate 0.0022 Epoch: 17 Global Step: 211340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:17,080-Speed 3287.53 samples/sec Loss 1.0005 LearningRate 0.0022 Epoch: 17 Global Step: 211350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:20,202-Speed 3281.31 samples/sec Loss 1.0199 LearningRate 0.0022 Epoch: 17 Global Step: 211360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:23,316-Speed 3289.75 samples/sec Loss 1.0507 LearningRate 0.0022 Epoch: 17 Global Step: 211370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:26,523-Speed 3194.93 samples/sec Loss 0.9960 LearningRate 0.0022 Epoch: 17 Global Step: 211380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:29,640-Speed 3286.05 samples/sec Loss 1.0265 LearningRate 0.0022 Epoch: 17 Global Step: 211390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:04:32,832-Speed 3208.71 samples/sec Loss 1.0338 LearningRate 0.0022 Epoch: 17 Global Step: 211400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:36,882-Speed 2529.41 samples/sec Loss 1.0029 LearningRate 0.0022 Epoch: 17 Global Step: 211410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:40,122-Speed 3161.60 samples/sec Loss 1.0361 LearningRate 0.0022 Epoch: 17 Global Step: 211420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:43,359-Speed 3164.40 samples/sec Loss 1.0356 LearningRate 0.0022 Epoch: 17 Global Step: 211430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:48,335-Speed 2058.37 samples/sec Loss 1.0205 LearningRate 0.0022 Epoch: 17 Global Step: 211440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:51,934-Speed 2846.25 samples/sec Loss 1.0380 LearningRate 0.0022 Epoch: 17 Global Step: 211450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:55,104-Speed 3230.83 samples/sec Loss 1.0479 LearningRate 0.0022 Epoch: 17 Global Step: 211460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:04:58,214-Speed 3293.73 samples/sec Loss 1.0870 LearningRate 0.0022 Epoch: 17 Global Step: 211470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:01,297-Speed 3324.38 samples/sec Loss 1.0298 LearningRate 0.0022 Epoch: 17 Global Step: 211480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:04,400-Speed 3300.73 samples/sec Loss 1.0343 LearningRate 0.0022 Epoch: 17 Global Step: 211490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:07,495-Speed 3310.54 samples/sec Loss 1.0380 LearningRate 0.0022 Epoch: 17 Global Step: 211500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:05:10,555-Speed 3347.11 samples/sec Loss 1.0280 LearningRate 0.0022 Epoch: 17 Global Step: 211510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:05:13,667-Speed 3291.04 samples/sec Loss 1.0187 LearningRate 0.0022 Epoch: 17 Global Step: 211520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:05:16,780-Speed 3291.48 samples/sec Loss 1.0788 LearningRate 0.0022 Epoch: 17 Global Step: 211530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:05:19,890-Speed 3292.62 samples/sec Loss 1.0560 LearningRate 0.0022 Epoch: 17 Global Step: 211540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:05:22,984-Speed 3310.60 samples/sec Loss 1.0694 LearningRate 0.0022 Epoch: 17 Global Step: 211550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:26,093-Speed 3295.13 samples/sec Loss 1.0440 LearningRate 0.0022 Epoch: 17 Global Step: 211560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:29,246-Speed 3248.40 samples/sec Loss 1.0411 LearningRate 0.0022 Epoch: 17 Global Step: 211570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:32,335-Speed 3318.05 samples/sec Loss 1.0500 LearningRate 0.0022 Epoch: 17 Global Step: 211580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:35,414-Speed 3327.28 samples/sec Loss 1.0654 LearningRate 0.0022 Epoch: 17 Global Step: 211590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:38,560-Speed 3255.39 samples/sec Loss 0.9856 LearningRate 0.0022 Epoch: 17 Global Step: 211600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:41,759-Speed 3201.93 samples/sec Loss 1.0640 LearningRate 0.0022 Epoch: 17 Global Step: 211610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:44,847-Speed 3317.59 samples/sec Loss 1.0468 LearningRate 0.0022 Epoch: 17 Global Step: 211620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:47,917-Speed 3336.71 samples/sec Loss 1.0285 LearningRate 0.0022 Epoch: 17 Global Step: 211630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:51,078-Speed 3240.35 samples/sec Loss 1.0563 LearningRate 0.0022 Epoch: 17 Global Step: 211640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:05:54,229-Speed 3251.24 samples/sec Loss 1.0155 LearningRate 0.0022 Epoch: 17 Global Step: 211650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:05:57,306-Speed 3327.94 samples/sec Loss 1.0722 LearningRate 0.0022 Epoch: 17 Global Step: 211660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:06:00,406-Speed 3304.33 samples/sec Loss 1.0437 LearningRate 0.0022 Epoch: 17 Global Step: 211670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:06:03,530-Speed 3278.81 samples/sec Loss 1.0850 LearningRate 0.0022 Epoch: 17 Global Step: 211680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:06:06,661-Speed 3271.93 samples/sec Loss 1.0516 LearningRate 0.0022 Epoch: 17 Global Step: 211690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:06:09,739-Speed 3327.31 samples/sec Loss 1.0245 LearningRate 0.0022 Epoch: 17 Global Step: 211700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:06:12,858-Speed 3285.06 samples/sec Loss 1.0730 LearningRate 0.0022 Epoch: 17 Global Step: 211710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:06:15,947-Speed 3316.27 samples/sec Loss 1.0390 LearningRate 0.0022 Epoch: 17 Global Step: 211720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:06:19,106-Speed 3241.99 samples/sec Loss 0.9958 LearningRate 0.0022 Epoch: 17 Global Step: 211730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:06:22,181-Speed 3331.28 samples/sec Loss 1.0646 LearningRate 0.0022 Epoch: 17 Global Step: 211740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:06:25,270-Speed 3316.16 samples/sec Loss 1.0541 LearningRate 0.0022 Epoch: 17 Global Step: 211750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:06:28,372-Speed 3301.45 samples/sec Loss 1.0464 LearningRate 0.0022 Epoch: 17 Global Step: 211760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:06:31,467-Speed 3309.18 samples/sec Loss 1.0528 LearningRate 0.0022 Epoch: 17 Global Step: 211770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:06:34,563-Speed 3309.71 samples/sec Loss 1.0003 LearningRate 0.0022 Epoch: 17 Global Step: 211780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:06:37,664-Speed 3302.76 samples/sec Loss 0.9901 LearningRate 0.0022 Epoch: 17 Global Step: 211790 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:06:40,851-Speed 3213.95 samples/sec Loss 1.0780 LearningRate 0.0022 Epoch: 17 Global Step: 211800 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:06:43,972-Speed 3282.36 samples/sec Loss 1.0066 LearningRate 0.0022 Epoch: 17 Global Step: 211810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:06:47,081-Speed 3294.83 samples/sec Loss 1.0903 LearningRate 0.0022 Epoch: 17 Global Step: 211820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:06:50,203-Speed 3281.07 samples/sec Loss 1.0471 LearningRate 0.0022 Epoch: 17 Global Step: 211830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:06:53,362-Speed 3242.09 samples/sec Loss 1.0509 LearningRate 0.0022 Epoch: 17 Global Step: 211840 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:06:56,465-Speed 3301.09 samples/sec Loss 1.0484 LearningRate 0.0022 Epoch: 17 Global Step: 211850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:06:59,581-Speed 3287.16 samples/sec Loss 1.0147 LearningRate 0.0022 Epoch: 17 Global Step: 211860 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:07:02,687-Speed 3298.56 samples/sec Loss 1.0619 LearningRate 0.0022 Epoch: 17 Global Step: 211870 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:07:05,809-Speed 3280.31 samples/sec Loss 1.0667 LearningRate 0.0022 Epoch: 17 Global Step: 211880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:07:08,923-Speed 3289.04 samples/sec Loss 1.0567 LearningRate 0.0022 Epoch: 17 Global Step: 211890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:07:12,001-Speed 3328.42 samples/sec Loss 1.0393 LearningRate 0.0022 Epoch: 17 Global Step: 211900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:07:15,072-Speed 3335.88 samples/sec Loss 1.0729 LearningRate 0.0022 Epoch: 17 Global Step: 211910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:07:18,192-Speed 3282.63 samples/sec Loss 1.0200 LearningRate 0.0022 Epoch: 17 Global Step: 211920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:07:21,262-Speed 3337.01 samples/sec Loss 1.0520 LearningRate 0.0022 Epoch: 17 Global Step: 211930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:07:24,351-Speed 3315.91 samples/sec Loss 1.0278 LearningRate 0.0022 Epoch: 17 Global Step: 211940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:07:27,480-Speed 3273.52 samples/sec Loss 1.0653 LearningRate 0.0022 Epoch: 17 Global Step: 211950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:07:30,533-Speed 3356.02 samples/sec Loss 1.0203 LearningRate 0.0022 Epoch: 17 Global Step: 211960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:07:33,597-Speed 3342.63 samples/sec Loss 1.0340 LearningRate 0.0022 Epoch: 17 Global Step: 211970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:07:36,702-Speed 3299.34 samples/sec Loss 1.0717 LearningRate 0.0022 Epoch: 17 Global Step: 211980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:07:39,837-Speed 3267.41 samples/sec Loss 0.9902 LearningRate 0.0022 Epoch: 17 Global Step: 211990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:07:42,931-Speed 3310.65 samples/sec Loss 1.0264 LearningRate 0.0021 Epoch: 17 Global Step: 212000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:07:45,997-Speed 3340.36 samples/sec Loss 1.0668 LearningRate 0.0021 Epoch: 17 Global Step: 212010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:07:49,135-Speed 3264.56 samples/sec Loss 1.0264 LearningRate 0.0021 Epoch: 17 Global Step: 212020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:07:52,223-Speed 3317.28 samples/sec Loss 1.0135 LearningRate 0.0021 Epoch: 17 Global Step: 212030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:07:55,354-Speed 3270.96 samples/sec Loss 1.0087 LearningRate 0.0021 Epoch: 17 Global Step: 212040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:07:58,459-Speed 3299.45 samples/sec Loss 1.0643 LearningRate 0.0021 Epoch: 17 Global Step: 212050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:08:01,597-Speed 3263.90 samples/sec Loss 1.0672 LearningRate 0.0021 Epoch: 17 Global Step: 212060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:08:04,797-Speed 3201.20 samples/sec Loss 1.0739 LearningRate 0.0021 Epoch: 17 Global Step: 212070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:08:07,864-Speed 3339.75 samples/sec Loss 1.0369 LearningRate 0.0021 Epoch: 17 Global Step: 212080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 20:08:10,949-Speed 3320.07 samples/sec Loss 1.0525 LearningRate 0.0021 Epoch: 17 Global Step: 212090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:08:14,084-Speed 3267.28 samples/sec Loss 1.0797 LearningRate 0.0021 Epoch: 17 Global Step: 212100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:08:17,239-Speed 3246.43 samples/sec Loss 1.0694 LearningRate 0.0021 Epoch: 17 Global Step: 212110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:08:20,311-Speed 3334.86 samples/sec Loss 1.0372 LearningRate 0.0021 Epoch: 17 Global Step: 212120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:08:23,413-Speed 3302.82 samples/sec Loss 1.0434 LearningRate 0.0021 Epoch: 17 Global Step: 212130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:08:26,498-Speed 3320.11 samples/sec Loss 0.9973 LearningRate 0.0021 Epoch: 17 Global Step: 212140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:08:29,647-Speed 3252.77 samples/sec Loss 1.0374 LearningRate 0.0021 Epoch: 17 Global Step: 212150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:08:32,716-Speed 3337.08 samples/sec Loss 1.0923 LearningRate 0.0021 Epoch: 17 Global Step: 212160 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:08:35,835-Speed 3285.01 samples/sec Loss 1.0472 LearningRate 0.0021 Epoch: 17 Global Step: 212170 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:08:38,935-Speed 3303.77 samples/sec Loss 1.0974 LearningRate 0.0021 Epoch: 17 Global Step: 212180 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:08:42,021-Speed 3319.68 samples/sec Loss 1.0416 LearningRate 0.0021 Epoch: 17 Global Step: 212190 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:08:45,097-Speed 3329.62 samples/sec Loss 1.0771 LearningRate 0.0021 Epoch: 17 Global Step: 212200 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:08:48,228-Speed 3271.20 samples/sec Loss 1.0243 LearningRate 0.0021 Epoch: 17 Global Step: 212210 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:08:51,333-Speed 3299.58 samples/sec Loss 1.0014 LearningRate 0.0021 Epoch: 17 Global Step: 212220 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:08:54,433-Speed 3304.43 samples/sec Loss 1.0524 LearningRate 0.0021 Epoch: 17 Global Step: 212230 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:08:57,558-Speed 3277.79 samples/sec Loss 1.0132 LearningRate 0.0021 Epoch: 17 Global Step: 212240 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:00,635-Speed 3328.83 samples/sec Loss 1.0724 LearningRate 0.0021 Epoch: 17 Global Step: 212250 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:03,828-Speed 3207.76 samples/sec Loss 1.0793 LearningRate 0.0021 Epoch: 17 Global Step: 212260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:09:07,067-Speed 3163.04 samples/sec Loss 1.0089 LearningRate 0.0021 Epoch: 17 Global Step: 212270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:09:10,118-Speed 3357.16 samples/sec Loss 1.0553 LearningRate 0.0021 Epoch: 17 Global Step: 212280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:09:13,165-Speed 3361.47 samples/sec Loss 1.0279 LearningRate 0.0021 Epoch: 17 Global Step: 212290 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:16,248-Speed 3323.13 samples/sec Loss 1.0818 LearningRate 0.0021 Epoch: 17 Global Step: 212300 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:19,352-Speed 3299.81 samples/sec Loss 1.0595 LearningRate 0.0021 Epoch: 17 Global Step: 212310 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:22,449-Speed 3307.12 samples/sec Loss 1.1018 LearningRate 0.0021 Epoch: 17 Global Step: 212320 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:25,542-Speed 3311.87 samples/sec Loss 1.0890 LearningRate 0.0021 Epoch: 17 Global Step: 212330 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:28,690-Speed 3254.87 samples/sec Loss 1.0460 LearningRate 0.0021 Epoch: 17 Global Step: 212340 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:31,828-Speed 3263.68 samples/sec Loss 1.0695 LearningRate 0.0021 Epoch: 17 Global Step: 212350 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:34,916-Speed 3316.81 samples/sec Loss 1.0683 LearningRate 0.0021 Epoch: 17 Global Step: 212360 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:38,072-Speed 3246.49 samples/sec Loss 1.0865 LearningRate 0.0021 Epoch: 17 Global Step: 212370 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:41,184-Speed 3291.19 samples/sec Loss 1.0196 LearningRate 0.0021 Epoch: 17 Global Step: 212380 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:09:44,357-Speed 3228.51 samples/sec Loss 1.0694 LearningRate 0.0021 Epoch: 17 Global Step: 212390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:09:47,493-Speed 3265.65 samples/sec Loss 1.0133 LearningRate 0.0021 Epoch: 17 Global Step: 212400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:09:50,664-Speed 3230.47 samples/sec Loss 1.0643 LearningRate 0.0021 Epoch: 17 Global Step: 212410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:09:53,751-Speed 3318.15 samples/sec Loss 1.0617 LearningRate 0.0021 Epoch: 17 Global Step: 212420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:09:56,819-Speed 3338.91 samples/sec Loss 1.0521 LearningRate 0.0021 Epoch: 17 Global Step: 212430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:09:59,890-Speed 3335.61 samples/sec Loss 1.0541 LearningRate 0.0021 Epoch: 17 Global Step: 212440 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:10:02,992-Speed 3302.78 samples/sec Loss 1.0094 LearningRate 0.0021 Epoch: 17 Global Step: 212450 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:10:06,097-Speed 3298.65 samples/sec Loss 1.0882 LearningRate 0.0021 Epoch: 17 Global Step: 212460 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:10:09,192-Speed 3309.91 samples/sec Loss 1.0551 LearningRate 0.0021 Epoch: 17 Global Step: 212470 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:10:12,334-Speed 3259.77 samples/sec Loss 1.0380 LearningRate 0.0021 Epoch: 17 Global Step: 212480 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:10:15,453-Speed 3283.84 samples/sec Loss 1.0318 LearningRate 0.0021 Epoch: 17 Global Step: 212490 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:10:18,561-Speed 3295.89 samples/sec Loss 1.0533 LearningRate 0.0021 Epoch: 17 Global Step: 212500 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:10:21,608-Speed 3361.50 samples/sec Loss 1.0980 LearningRate 0.0021 Epoch: 17 Global Step: 212510 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:10:24,727-Speed 3284.74 samples/sec Loss 1.0428 LearningRate 0.0021 Epoch: 17 Global Step: 212520 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:10:27,829-Speed 3302.23 samples/sec Loss 1.0422 LearningRate 0.0021 Epoch: 17 Global Step: 212530 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:10:30,891-Speed 3345.28 samples/sec Loss 1.0484 LearningRate 0.0021 Epoch: 17 Global Step: 212540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:10:33,995-Speed 3300.42 samples/sec Loss 0.9991 LearningRate 0.0021 Epoch: 17 Global Step: 212550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:10:37,083-Speed 3317.23 samples/sec Loss 1.0598 LearningRate 0.0021 Epoch: 17 Global Step: 212560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:10:40,220-Speed 3264.80 samples/sec Loss 1.1023 LearningRate 0.0021 Epoch: 17 Global Step: 212570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:10:43,305-Speed 3320.78 samples/sec Loss 1.1062 LearningRate 0.0021 Epoch: 17 Global Step: 212580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:10:46,402-Speed 3307.69 samples/sec Loss 1.0357 LearningRate 0.0021 Epoch: 17 Global Step: 212590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:10:49,516-Speed 3288.93 samples/sec Loss 1.0994 LearningRate 0.0021 Epoch: 17 Global Step: 212600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:10:52,617-Speed 3302.72 samples/sec Loss 1.0597 LearningRate 0.0021 Epoch: 17 Global Step: 212610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:10:55,746-Speed 3274.63 samples/sec Loss 1.0336 LearningRate 0.0021 Epoch: 17 Global Step: 212620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:10:58,849-Speed 3301.42 samples/sec Loss 1.0700 LearningRate 0.0021 Epoch: 17 Global Step: 212630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:11:01,927-Speed 3327.34 samples/sec Loss 1.0205 LearningRate 0.0021 Epoch: 17 Global Step: 212640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:11:05,047-Speed 3283.72 samples/sec Loss 1.0390 LearningRate 0.0021 Epoch: 17 Global Step: 212650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:11:08,113-Speed 3340.19 samples/sec Loss 1.0222 LearningRate 0.0021 Epoch: 17 Global Step: 212660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:11:11,175-Speed 3345.86 samples/sec Loss 1.0344 LearningRate 0.0021 Epoch: 17 Global Step: 212670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:11:14,230-Speed 3352.70 samples/sec Loss 1.0425 LearningRate 0.0021 Epoch: 17 Global Step: 212680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:11:17,356-Speed 3276.67 samples/sec Loss 1.0586 LearningRate 0.0021 Epoch: 17 Global Step: 212690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:11:20,457-Speed 3303.41 samples/sec Loss 1.0480 LearningRate 0.0021 Epoch: 17 Global Step: 212700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:11:23,619-Speed 3239.07 samples/sec Loss 1.0358 LearningRate 0.0021 Epoch: 17 Global Step: 212710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:11:26,739-Speed 3282.90 samples/sec Loss 1.0141 LearningRate 0.0021 Epoch: 17 Global Step: 212720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:11:29,843-Speed 3300.69 samples/sec Loss 1.0667 LearningRate 0.0021 Epoch: 17 Global Step: 212730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:11:32,952-Speed 3294.27 samples/sec Loss 1.0730 LearningRate 0.0021 Epoch: 17 Global Step: 212740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:11:36,067-Speed 3288.36 samples/sec Loss 1.0764 LearningRate 0.0021 Epoch: 17 Global Step: 212750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:11:39,191-Speed 3279.55 samples/sec Loss 1.0540 LearningRate 0.0021 Epoch: 17 Global Step: 212760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:11:42,301-Speed 3294.24 samples/sec Loss 1.0636 LearningRate 0.0021 Epoch: 17 Global Step: 212770 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:11:45,384-Speed 3322.64 samples/sec Loss 1.0768 LearningRate 0.0021 Epoch: 17 Global Step: 212780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:11:48,476-Speed 3312.39 samples/sec Loss 1.0683 LearningRate 0.0021 Epoch: 17 Global Step: 212790 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:11:51,607-Speed 3271.78 samples/sec Loss 1.0623 LearningRate 0.0021 Epoch: 17 Global Step: 212800 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:11:54,697-Speed 3315.27 samples/sec Loss 1.0105 LearningRate 0.0021 Epoch: 17 Global Step: 212810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:11:57,756-Speed 3348.85 samples/sec Loss 1.0426 LearningRate 0.0021 Epoch: 17 Global Step: 212820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:12:00,925-Speed 3231.69 samples/sec Loss 1.0843 LearningRate 0.0021 Epoch: 17 Global Step: 212830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:12:04,088-Speed 3238.36 samples/sec Loss 1.0343 LearningRate 0.0021 Epoch: 17 Global Step: 212840 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:12:07,241-Speed 3248.73 samples/sec Loss 1.0515 LearningRate 0.0021 Epoch: 17 Global Step: 212850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:12:10,365-Speed 3279.72 samples/sec Loss 1.0449 LearningRate 0.0020 Epoch: 17 Global Step: 212860 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:12:13,576-Speed 3189.89 samples/sec Loss 1.0774 LearningRate 0.0020 Epoch: 17 Global Step: 212870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:16,740-Speed 3237.14 samples/sec Loss 1.0783 LearningRate 0.0020 Epoch: 17 Global Step: 212880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:19,880-Speed 3262.53 samples/sec Loss 1.0091 LearningRate 0.0020 Epoch: 17 Global Step: 212890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:23,026-Speed 3255.33 samples/sec Loss 1.0953 LearningRate 0.0020 Epoch: 17 Global Step: 212900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:26,203-Speed 3224.92 samples/sec Loss 1.0325 LearningRate 0.0020 Epoch: 17 Global Step: 212910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:29,425-Speed 3178.28 samples/sec Loss 1.0757 LearningRate 0.0020 Epoch: 17 Global Step: 212920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:32,541-Speed 3287.40 samples/sec Loss 1.0492 LearningRate 0.0020 Epoch: 17 Global Step: 212930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:35,759-Speed 3183.74 samples/sec Loss 1.0643 LearningRate 0.0020 Epoch: 17 Global Step: 212940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:38,924-Speed 3236.58 samples/sec Loss 1.0967 LearningRate 0.0020 Epoch: 17 Global Step: 212950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:42,079-Speed 3246.04 samples/sec Loss 1.0255 LearningRate 0.0020 Epoch: 17 Global Step: 212960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:45,169-Speed 3315.11 samples/sec Loss 1.0684 LearningRate 0.0020 Epoch: 17 Global Step: 212970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:12:48,404-Speed 3166.40 samples/sec Loss 1.0354 LearningRate 0.0020 Epoch: 17 Global Step: 212980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:51,538-Speed 3268.88 samples/sec Loss 1.0546 LearningRate 0.0020 Epoch: 17 Global Step: 212990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:54,726-Speed 3212.79 samples/sec Loss 1.0452 LearningRate 0.0020 Epoch: 17 Global Step: 213000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:12:57,853-Speed 3275.89 samples/sec Loss 1.0229 LearningRate 0.0020 Epoch: 17 Global Step: 213010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:00,935-Speed 3322.74 samples/sec Loss 1.0357 LearningRate 0.0020 Epoch: 17 Global Step: 213020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:04,085-Speed 3253.01 samples/sec Loss 1.0513 LearningRate 0.0020 Epoch: 17 Global Step: 213030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:07,206-Speed 3281.71 samples/sec Loss 1.0432 LearningRate 0.0020 Epoch: 17 Global Step: 213040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:10,319-Speed 3290.10 samples/sec Loss 1.0652 LearningRate 0.0020 Epoch: 17 Global Step: 213050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:13,427-Speed 3296.41 samples/sec Loss 1.0542 LearningRate 0.0020 Epoch: 17 Global Step: 213060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:16,605-Speed 3222.55 samples/sec Loss 1.0392 LearningRate 0.0020 Epoch: 17 Global Step: 213070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:19,704-Speed 3305.93 samples/sec Loss 1.0824 LearningRate 0.0020 Epoch: 17 Global Step: 213080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:22,790-Speed 3319.52 samples/sec Loss 1.0907 LearningRate 0.0020 Epoch: 17 Global Step: 213090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:25,912-Speed 3281.05 samples/sec Loss 1.0101 LearningRate 0.0020 Epoch: 17 Global Step: 213100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:29,071-Speed 3242.80 samples/sec Loss 1.0675 LearningRate 0.0020 Epoch: 17 Global Step: 213110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:32,197-Speed 3276.52 samples/sec Loss 1.0463 LearningRate 0.0020 Epoch: 17 Global Step: 213120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:35,305-Speed 3295.70 samples/sec Loss 1.0622 LearningRate 0.0020 Epoch: 17 Global Step: 213130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:38,449-Speed 3258.56 samples/sec Loss 1.0168 LearningRate 0.0020 Epoch: 17 Global Step: 213140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:41,578-Speed 3273.12 samples/sec Loss 1.0616 LearningRate 0.0020 Epoch: 17 Global Step: 213150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:44,678-Speed 3304.80 samples/sec Loss 1.0800 LearningRate 0.0020 Epoch: 17 Global Step: 213160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:47,763-Speed 3319.98 samples/sec Loss 1.0422 LearningRate 0.0020 Epoch: 17 Global Step: 213170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:13:50,985-Speed 3179.37 samples/sec Loss 1.0447 LearningRate 0.0020 Epoch: 17 Global Step: 213180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:13:54,166-Speed 3220.07 samples/sec Loss 1.0435 LearningRate 0.0020 Epoch: 17 Global Step: 213190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:13:57,228-Speed 3345.53 samples/sec Loss 1.0612 LearningRate 0.0020 Epoch: 17 Global Step: 213200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:14:00,304-Speed 3330.12 samples/sec Loss 1.0954 LearningRate 0.0020 Epoch: 17 Global Step: 213210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:14:03,393-Speed 3315.64 samples/sec Loss 1.0890 LearningRate 0.0020 Epoch: 17 Global Step: 213220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:14:06,526-Speed 3269.09 samples/sec Loss 1.0574 LearningRate 0.0020 Epoch: 17 Global Step: 213230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:14:09,587-Speed 3346.37 samples/sec Loss 1.0914 LearningRate 0.0020 Epoch: 17 Global Step: 213240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:14:12,670-Speed 3323.45 samples/sec Loss 1.0293 LearningRate 0.0020 Epoch: 17 Global Step: 213250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:14:15,780-Speed 3293.27 samples/sec Loss 1.0630 LearningRate 0.0020 Epoch: 17 Global Step: 213260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:14:18,902-Speed 3280.84 samples/sec Loss 1.0217 LearningRate 0.0020 Epoch: 17 Global Step: 213270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:14:21,969-Speed 3339.92 samples/sec Loss 1.0218 LearningRate 0.0020 Epoch: 17 Global Step: 213280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:14:25,082-Speed 3291.24 samples/sec Loss 1.0584 LearningRate 0.0020 Epoch: 17 Global Step: 213290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:14:28,142-Speed 3347.30 samples/sec Loss 1.0515 LearningRate 0.0020 Epoch: 17 Global Step: 213300 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:14:31,307-Speed 3235.83 samples/sec Loss 1.0605 LearningRate 0.0020 Epoch: 17 Global Step: 213310 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:14:34,410-Speed 3301.89 samples/sec Loss 1.0411 LearningRate 0.0020 Epoch: 17 Global Step: 213320 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:14:37,618-Speed 3192.18 samples/sec Loss 1.0949 LearningRate 0.0020 Epoch: 17 Global Step: 213330 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:14:40,724-Speed 3298.09 samples/sec Loss 1.0317 LearningRate 0.0020 Epoch: 17 Global Step: 213340 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:14:43,900-Speed 3225.31 samples/sec Loss 1.0435 LearningRate 0.0020 Epoch: 17 Global Step: 213350 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:14:46,977-Speed 3329.28 samples/sec Loss 1.0565 LearningRate 0.0020 Epoch: 17 Global Step: 213360 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:14:50,104-Speed 3276.18 samples/sec Loss 1.0264 LearningRate 0.0020 Epoch: 17 Global Step: 213370 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:14:53,237-Speed 3269.27 samples/sec Loss 1.0356 LearningRate 0.0020 Epoch: 17 Global Step: 213380 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:14:56,368-Speed 3271.52 samples/sec Loss 1.0842 LearningRate 0.0020 Epoch: 17 Global Step: 213390 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:14:59,451-Speed 3322.63 samples/sec Loss 1.0780 LearningRate 0.0020 Epoch: 17 Global Step: 213400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:02,572-Speed 3281.61 samples/sec Loss 1.0255 LearningRate 0.0020 Epoch: 17 Global Step: 213410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:05,647-Speed 3331.65 samples/sec Loss 1.0529 LearningRate 0.0020 Epoch: 17 Global Step: 213420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:08,715-Speed 3338.54 samples/sec Loss 1.0380 LearningRate 0.0020 Epoch: 17 Global Step: 213430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:11,805-Speed 3315.31 samples/sec Loss 1.0327 LearningRate 0.0020 Epoch: 17 Global Step: 213440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:14,917-Speed 3291.28 samples/sec Loss 1.0339 LearningRate 0.0020 Epoch: 17 Global Step: 213450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:18,102-Speed 3216.48 samples/sec Loss 1.0250 LearningRate 0.0020 Epoch: 17 Global Step: 213460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:21,152-Speed 3358.05 samples/sec Loss 1.0335 LearningRate 0.0020 Epoch: 17 Global Step: 213470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:24,224-Speed 3334.71 samples/sec Loss 1.0260 LearningRate 0.0020 Epoch: 17 Global Step: 213480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:27,317-Speed 3312.14 samples/sec Loss 1.0848 LearningRate 0.0020 Epoch: 17 Global Step: 213490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:30,400-Speed 3321.93 samples/sec Loss 1.0266 LearningRate 0.0020 Epoch: 17 Global Step: 213500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:15:33,486-Speed 3318.93 samples/sec Loss 1.0789 LearningRate 0.0020 Epoch: 17 Global Step: 213510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:36,644-Speed 3243.75 samples/sec Loss 1.0711 LearningRate 0.0020 Epoch: 17 Global Step: 213520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:39,801-Speed 3244.49 samples/sec Loss 1.0332 LearningRate 0.0020 Epoch: 17 Global Step: 213530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:42,890-Speed 3316.20 samples/sec Loss 1.0739 LearningRate 0.0020 Epoch: 17 Global Step: 213540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:45,974-Speed 3322.05 samples/sec Loss 1.0861 LearningRate 0.0020 Epoch: 17 Global Step: 213550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:15:49,042-Speed 3337.73 samples/sec Loss 1.0740 LearningRate 0.0020 Epoch: 17 Global Step: 213560 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:15:52,209-Speed 3235.12 samples/sec Loss 1.0424 LearningRate 0.0020 Epoch: 17 Global Step: 213570 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:15:55,342-Speed 3269.43 samples/sec Loss 1.0331 LearningRate 0.0020 Epoch: 17 Global Step: 213580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:15:58,395-Speed 3355.29 samples/sec Loss 1.0461 LearningRate 0.0020 Epoch: 17 Global Step: 213590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:16:01,501-Speed 3297.98 samples/sec Loss 1.0930 LearningRate 0.0020 Epoch: 17 Global Step: 213600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:16:04,651-Speed 3251.37 samples/sec Loss 1.0702 LearningRate 0.0020 Epoch: 17 Global Step: 213610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:16:07,799-Speed 3253.92 samples/sec Loss 1.0919 LearningRate 0.0020 Epoch: 17 Global Step: 213620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:16:10,883-Speed 3321.32 samples/sec Loss 1.0490 LearningRate 0.0020 Epoch: 17 Global Step: 213630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:16:14,045-Speed 3239.91 samples/sec Loss 1.0480 LearningRate 0.0020 Epoch: 17 Global Step: 213640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:16:17,142-Speed 3307.74 samples/sec Loss 1.0679 LearningRate 0.0020 Epoch: 17 Global Step: 213650 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:16:20,202-Speed 3347.69 samples/sec Loss 1.0600 LearningRate 0.0020 Epoch: 17 Global Step: 213660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:16:23,268-Speed 3340.89 samples/sec Loss 1.0882 LearningRate 0.0020 Epoch: 17 Global Step: 213670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:16:26,360-Speed 3312.53 samples/sec Loss 1.0850 LearningRate 0.0020 Epoch: 17 Global Step: 213680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:16:29,469-Speed 3294.77 samples/sec Loss 1.0966 LearningRate 0.0020 Epoch: 17 Global Step: 213690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:16:32,547-Speed 3327.92 samples/sec Loss 1.0447 LearningRate 0.0020 Epoch: 17 Global Step: 213700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:16:35,650-Speed 3300.43 samples/sec Loss 0.9988 LearningRate 0.0020 Epoch: 17 Global Step: 213710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:16:38,810-Speed 3242.08 samples/sec Loss 1.0814 LearningRate 0.0020 Epoch: 17 Global Step: 213720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:16:41,901-Speed 3314.31 samples/sec Loss 1.0547 LearningRate 0.0020 Epoch: 17 Global Step: 213730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:16:44,954-Speed 3355.17 samples/sec Loss 1.0627 LearningRate 0.0019 Epoch: 17 Global Step: 213740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:16:48,042-Speed 3317.06 samples/sec Loss 1.0360 LearningRate 0.0019 Epoch: 17 Global Step: 213750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:16:51,158-Speed 3287.09 samples/sec Loss 1.0500 LearningRate 0.0019 Epoch: 17 Global Step: 213760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:16:54,304-Speed 3255.68 samples/sec Loss 1.0807 LearningRate 0.0019 Epoch: 17 Global Step: 213770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:16:57,385-Speed 3326.65 samples/sec Loss 1.0395 LearningRate 0.0019 Epoch: 17 Global Step: 213780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:17:00,428-Speed 3366.38 samples/sec Loss 1.0551 LearningRate 0.0019 Epoch: 17 Global Step: 213790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:17:03,557-Speed 3273.02 samples/sec Loss 1.0023 LearningRate 0.0019 Epoch: 17 Global Step: 213800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:17:06,656-Speed 3305.88 samples/sec Loss 1.0403 LearningRate 0.0019 Epoch: 17 Global Step: 213810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:17:09,724-Speed 3338.86 samples/sec Loss 1.0911 LearningRate 0.0019 Epoch: 17 Global Step: 213820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:17:12,844-Speed 3283.58 samples/sec Loss 1.0496 LearningRate 0.0019 Epoch: 17 Global Step: 213830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:17:15,978-Speed 3267.42 samples/sec Loss 1.0739 LearningRate 0.0019 Epoch: 17 Global Step: 213840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:17:19,064-Speed 3319.83 samples/sec Loss 1.1080 LearningRate 0.0019 Epoch: 17 Global Step: 213850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:17:22,155-Speed 3314.35 samples/sec Loss 1.0947 LearningRate 0.0019 Epoch: 17 Global Step: 213860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:17:25,271-Speed 3287.62 samples/sec Loss 1.0358 LearningRate 0.0019 Epoch: 17 Global Step: 213870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:17:28,432-Speed 3239.90 samples/sec Loss 1.0453 LearningRate 0.0019 Epoch: 17 Global Step: 213880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:17:31,469-Speed 3373.71 samples/sec Loss 1.0245 LearningRate 0.0019 Epoch: 17 Global Step: 213890 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:17:34,570-Speed 3303.57 samples/sec Loss 1.0935 LearningRate 0.0019 Epoch: 17 Global Step: 213900 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:17:37,765-Speed 3205.81 samples/sec Loss 1.0450 LearningRate 0.0019 Epoch: 17 Global Step: 213910 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:17:40,854-Speed 3315.74 samples/sec Loss 1.0943 LearningRate 0.0019 Epoch: 17 Global Step: 213920 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:17:43,947-Speed 3311.39 samples/sec Loss 1.0629 LearningRate 0.0019 Epoch: 17 Global Step: 213930 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:17:47,002-Speed 3353.78 samples/sec Loss 1.0607 LearningRate 0.0019 Epoch: 17 Global Step: 213940 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:17:50,106-Speed 3299.84 samples/sec Loss 1.0631 LearningRate 0.0019 Epoch: 17 Global Step: 213950 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:17:53,236-Speed 3272.41 samples/sec Loss 1.0742 LearningRate 0.0019 Epoch: 17 Global Step: 213960 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:17:56,322-Speed 3320.01 samples/sec Loss 1.0592 LearningRate 0.0019 Epoch: 17 Global Step: 213970 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:17:59,439-Speed 3286.19 samples/sec Loss 1.0730 LearningRate 0.0019 Epoch: 17 Global Step: 213980 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:18:02,530-Speed 3313.27 samples/sec Loss 1.0519 LearningRate 0.0019 Epoch: 17 Global Step: 213990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:05,615-Speed 3320.97 samples/sec Loss 1.0976 LearningRate 0.0019 Epoch: 17 Global Step: 214000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:08,674-Speed 3348.54 samples/sec Loss 1.0422 LearningRate 0.0019 Epoch: 17 Global Step: 214010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:11,816-Speed 3260.40 samples/sec Loss 1.0660 LearningRate 0.0019 Epoch: 17 Global Step: 214020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:14,975-Speed 3242.37 samples/sec Loss 1.0670 LearningRate 0.0019 Epoch: 17 Global Step: 214030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:18,109-Speed 3268.07 samples/sec Loss 1.0657 LearningRate 0.0019 Epoch: 17 Global Step: 214040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:21,231-Speed 3281.95 samples/sec Loss 1.0637 LearningRate 0.0019 Epoch: 17 Global Step: 214050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:24,363-Speed 3269.40 samples/sec Loss 1.0294 LearningRate 0.0019 Epoch: 17 Global Step: 214060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:27,458-Speed 3310.62 samples/sec Loss 1.0382 LearningRate 0.0019 Epoch: 17 Global Step: 214070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:30,542-Speed 3321.45 samples/sec Loss 1.0579 LearningRate 0.0019 Epoch: 17 Global Step: 214080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:33,636-Speed 3309.67 samples/sec Loss 1.0074 LearningRate 0.0019 Epoch: 17 Global Step: 214090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:18:36,714-Speed 3328.74 samples/sec Loss 1.0615 LearningRate 0.0019 Epoch: 17 Global Step: 214100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:18:39,868-Speed 3247.23 samples/sec Loss 1.0892 LearningRate 0.0019 Epoch: 17 Global Step: 214110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:18:42,965-Speed 3308.00 samples/sec Loss 1.0602 LearningRate 0.0019 Epoch: 17 Global Step: 214120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:18:46,034-Speed 3337.99 samples/sec Loss 1.0876 LearningRate 0.0019 Epoch: 17 Global Step: 214130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:18:49,134-Speed 3304.39 samples/sec Loss 1.0688 LearningRate 0.0019 Epoch: 17 Global Step: 214140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:52,225-Speed 3313.79 samples/sec Loss 1.0489 LearningRate 0.0019 Epoch: 17 Global Step: 214150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:55,335-Speed 3293.46 samples/sec Loss 1.0095 LearningRate 0.0019 Epoch: 17 Global Step: 214160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:18:58,400-Speed 3341.65 samples/sec Loss 1.1243 LearningRate 0.0019 Epoch: 17 Global Step: 214170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:19:01,540-Speed 3262.05 samples/sec Loss 1.0220 LearningRate 0.0019 Epoch: 17 Global Step: 214180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:19:04,627-Speed 3319.03 samples/sec Loss 1.0423 LearningRate 0.0019 Epoch: 17 Global Step: 214190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:19:07,758-Speed 3271.44 samples/sec Loss 1.0672 LearningRate 0.0019 Epoch: 17 Global Step: 214200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:19:10,829-Speed 3335.29 samples/sec Loss 1.0567 LearningRate 0.0019 Epoch: 17 Global Step: 214210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:19:13,899-Speed 3336.40 samples/sec Loss 1.0546 LearningRate 0.0019 Epoch: 17 Global Step: 214220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:19:16,954-Speed 3352.66 samples/sec Loss 1.0646 LearningRate 0.0019 Epoch: 17 Global Step: 214230 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:19:20,041-Speed 3318.72 samples/sec Loss 1.0582 LearningRate 0.0019 Epoch: 17 Global Step: 214240 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:19:23,115-Speed 3331.71 samples/sec Loss 1.0631 LearningRate 0.0019 Epoch: 17 Global Step: 214250 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:19:26,180-Speed 3342.90 samples/sec Loss 1.0375 LearningRate 0.0019 Epoch: 17 Global Step: 214260 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:19:29,284-Speed 3299.82 samples/sec Loss 1.0591 LearningRate 0.0019 Epoch: 17 Global Step: 214270 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:19:32,375-Speed 3314.46 samples/sec Loss 1.0375 LearningRate 0.0019 Epoch: 17 Global Step: 214280 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:19:35,473-Speed 3306.41 samples/sec Loss 1.1019 LearningRate 0.0019 Epoch: 17 Global Step: 214290 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:19:38,575-Speed 3302.27 samples/sec Loss 1.0573 LearningRate 0.0019 Epoch: 17 Global Step: 214300 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:19:41,709-Speed 3268.34 samples/sec Loss 1.0659 LearningRate 0.0019 Epoch: 17 Global Step: 214310 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:19:44,779-Speed 3337.05 samples/sec Loss 1.0135 LearningRate 0.0019 Epoch: 17 Global Step: 214320 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:19:47,846-Speed 3339.41 samples/sec Loss 1.0488 LearningRate 0.0019 Epoch: 17 Global Step: 214330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:19:50,940-Speed 3310.37 samples/sec Loss 1.0144 LearningRate 0.0019 Epoch: 17 Global Step: 214340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:19:54,044-Speed 3300.13 samples/sec Loss 1.1141 LearningRate 0.0019 Epoch: 17 Global Step: 214350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:19:57,095-Speed 3357.27 samples/sec Loss 1.0264 LearningRate 0.0019 Epoch: 17 Global Step: 214360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:00,237-Speed 3259.70 samples/sec Loss 1.0202 LearningRate 0.0019 Epoch: 17 Global Step: 214370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:03,363-Speed 3276.78 samples/sec Loss 1.0869 LearningRate 0.0019 Epoch: 17 Global Step: 214380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:06,455-Speed 3313.81 samples/sec Loss 1.0513 LearningRate 0.0019 Epoch: 17 Global Step: 214390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:09,556-Speed 3302.35 samples/sec Loss 1.0568 LearningRate 0.0019 Epoch: 17 Global Step: 214400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:12,624-Speed 3339.17 samples/sec Loss 1.0942 LearningRate 0.0019 Epoch: 17 Global Step: 214410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:15,732-Speed 3296.06 samples/sec Loss 1.1017 LearningRate 0.0019 Epoch: 17 Global Step: 214420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:18,826-Speed 3310.21 samples/sec Loss 1.0804 LearningRate 0.0019 Epoch: 17 Global Step: 214430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:20:21,879-Speed 3355.47 samples/sec Loss 1.0675 LearningRate 0.0019 Epoch: 17 Global Step: 214440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:20:25,022-Speed 3258.72 samples/sec Loss 1.0288 LearningRate 0.0019 Epoch: 17 Global Step: 214450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:28,157-Speed 3267.88 samples/sec Loss 1.1071 LearningRate 0.0019 Epoch: 17 Global Step: 214460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:31,295-Speed 3263.79 samples/sec Loss 1.0870 LearningRate 0.0019 Epoch: 17 Global Step: 214470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:34,367-Speed 3334.81 samples/sec Loss 1.0842 LearningRate 0.0019 Epoch: 17 Global Step: 214480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:37,513-Speed 3256.09 samples/sec Loss 1.0465 LearningRate 0.0019 Epoch: 17 Global Step: 214490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:40,612-Speed 3305.13 samples/sec Loss 1.0827 LearningRate 0.0019 Epoch: 17 Global Step: 214500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:43,808-Speed 3204.64 samples/sec Loss 0.9996 LearningRate 0.0019 Epoch: 17 Global Step: 214510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:46,910-Speed 3302.30 samples/sec Loss 1.0428 LearningRate 0.0019 Epoch: 17 Global Step: 214520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:50,046-Speed 3266.48 samples/sec Loss 1.0686 LearningRate 0.0019 Epoch: 17 Global Step: 214530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:53,207-Speed 3240.36 samples/sec Loss 1.0916 LearningRate 0.0019 Epoch: 17 Global Step: 214540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:20:56,303-Speed 3308.89 samples/sec Loss 1.0585 LearningRate 0.0019 Epoch: 17 Global Step: 214550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:20:59,388-Speed 3320.29 samples/sec Loss 1.0273 LearningRate 0.0019 Epoch: 17 Global Step: 214560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:21:02,541-Speed 3248.38 samples/sec Loss 1.0969 LearningRate 0.0019 Epoch: 17 Global Step: 214570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:21:05,643-Speed 3302.39 samples/sec Loss 1.0740 LearningRate 0.0019 Epoch: 17 Global Step: 214580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:21:08,724-Speed 3324.77 samples/sec Loss 1.0615 LearningRate 0.0019 Epoch: 17 Global Step: 214590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:21:11,859-Speed 3267.21 samples/sec Loss 1.0837 LearningRate 0.0019 Epoch: 17 Global Step: 214600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:21:14,949-Speed 3315.35 samples/sec Loss 1.0804 LearningRate 0.0019 Epoch: 17 Global Step: 214610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:21:18,102-Speed 3248.00 samples/sec Loss 1.0555 LearningRate 0.0019 Epoch: 17 Global Step: 214620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:21:21,178-Speed 3330.05 samples/sec Loss 1.0561 LearningRate 0.0019 Epoch: 17 Global Step: 214630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:21:24,324-Speed 3255.80 samples/sec Loss 1.0790 LearningRate 0.0018 Epoch: 17 Global Step: 214640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:21:27,492-Speed 3233.31 samples/sec Loss 1.0696 LearningRate 0.0018 Epoch: 17 Global Step: 214650 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:21:30,620-Speed 3275.07 samples/sec Loss 1.0615 LearningRate 0.0018 Epoch: 17 Global Step: 214660 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:21:33,737-Speed 3286.51 samples/sec Loss 1.0726 LearningRate 0.0018 Epoch: 17 Global Step: 214670 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:21:36,910-Speed 3227.79 samples/sec Loss 1.0166 LearningRate 0.0018 Epoch: 17 Global Step: 214680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:21:40,008-Speed 3307.26 samples/sec Loss 1.0983 LearningRate 0.0018 Epoch: 17 Global Step: 214690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:21:43,135-Speed 3275.65 samples/sec Loss 1.0444 LearningRate 0.0018 Epoch: 17 Global Step: 214700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:21:46,246-Speed 3292.10 samples/sec Loss 1.0414 LearningRate 0.0018 Epoch: 17 Global Step: 214710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:21:49,365-Speed 3283.88 samples/sec Loss 1.0827 LearningRate 0.0018 Epoch: 17 Global Step: 214720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:21:52,495-Speed 3273.38 samples/sec Loss 1.0623 LearningRate 0.0018 Epoch: 17 Global Step: 214730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:21:56,313-Speed 2682.68 samples/sec Loss 1.0601 LearningRate 0.0018 Epoch: 17 Global Step: 214740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:21:59,413-Speed 3303.37 samples/sec Loss 1.0465 LearningRate 0.0018 Epoch: 17 Global Step: 214750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:22:02,497-Speed 3321.67 samples/sec Loss 1.0218 LearningRate 0.0018 Epoch: 17 Global Step: 214760 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:22:05,664-Speed 3234.66 samples/sec Loss 1.0409 LearningRate 0.0018 Epoch: 17 Global Step: 214770 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:22:08,780-Speed 3287.02 samples/sec Loss 1.0931 LearningRate 0.0018 Epoch: 17 Global Step: 214780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:22:11,893-Speed 3290.90 samples/sec Loss 1.0825 LearningRate 0.0018 Epoch: 17 Global Step: 214790 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:22:15,036-Speed 3258.95 samples/sec Loss 1.0691 LearningRate 0.0018 Epoch: 17 Global Step: 214800 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:22:18,117-Speed 3324.98 samples/sec Loss 1.0287 LearningRate 0.0018 Epoch: 17 Global Step: 214810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:22:21,183-Speed 3340.97 samples/sec Loss 1.0368 LearningRate 0.0018 Epoch: 17 Global Step: 214820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:22:24,297-Speed 3289.18 samples/sec Loss 1.0668 LearningRate 0.0018 Epoch: 17 Global Step: 214830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:22:27,403-Speed 3298.55 samples/sec Loss 1.0597 LearningRate 0.0018 Epoch: 17 Global Step: 214840 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:22:30,509-Speed 3297.49 samples/sec Loss 1.0818 LearningRate 0.0018 Epoch: 17 Global Step: 214850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:22:33,581-Speed 3334.66 samples/sec Loss 1.0445 LearningRate 0.0018 Epoch: 17 Global Step: 214860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:22:36,751-Speed 3230.64 samples/sec Loss 1.0492 LearningRate 0.0018 Epoch: 17 Global Step: 214870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:22:39,897-Speed 3256.30 samples/sec Loss 1.0768 LearningRate 0.0018 Epoch: 17 Global Step: 214880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:22:43,030-Speed 3269.59 samples/sec Loss 1.0824 LearningRate 0.0018 Epoch: 17 Global Step: 214890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:22:46,140-Speed 3293.97 samples/sec Loss 1.0529 LearningRate 0.0018 Epoch: 17 Global Step: 214900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:22:49,231-Speed 3314.12 samples/sec Loss 1.0555 LearningRate 0.0018 Epoch: 17 Global Step: 214910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:22:52,296-Speed 3342.78 samples/sec Loss 1.0762 LearningRate 0.0018 Epoch: 17 Global Step: 214920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:22:55,406-Speed 3293.74 samples/sec Loss 1.0842 LearningRate 0.0018 Epoch: 17 Global Step: 214930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:22:58,449-Speed 3365.86 samples/sec Loss 1.0246 LearningRate 0.0018 Epoch: 17 Global Step: 214940 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:23:01,508-Speed 3348.02 samples/sec Loss 1.0293 LearningRate 0.0018 Epoch: 17 Global Step: 214950 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:23:04,580-Speed 3334.56 samples/sec Loss 1.0537 LearningRate 0.0018 Epoch: 17 Global Step: 214960 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:23:07,659-Speed 3326.87 samples/sec Loss 1.0590 LearningRate 0.0018 Epoch: 17 Global Step: 214970 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:23:10,726-Speed 3339.98 samples/sec Loss 1.0566 LearningRate 0.0018 Epoch: 17 Global Step: 214980 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:23:13,792-Speed 3341.53 samples/sec Loss 1.1191 LearningRate 0.0018 Epoch: 17 Global Step: 214990 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:23:16,944-Speed 3250.24 samples/sec Loss 1.0512 LearningRate 0.0018 Epoch: 17 Global Step: 215000 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:23:20,054-Speed 3293.47 samples/sec Loss 1.0547 LearningRate 0.0018 Epoch: 17 Global Step: 215010 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:23:23,158-Speed 3300.07 samples/sec Loss 1.0523 LearningRate 0.0018 Epoch: 17 Global Step: 215020 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:23:26,297-Speed 3262.32 samples/sec Loss 1.1366 LearningRate 0.0018 Epoch: 17 Global Step: 215030 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:23:29,394-Speed 3307.75 samples/sec Loss 1.0573 LearningRate 0.0018 Epoch: 17 Global Step: 215040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:23:32,456-Speed 3346.12 samples/sec Loss 1.0619 LearningRate 0.0018 Epoch: 17 Global Step: 215050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:23:35,592-Speed 3265.45 samples/sec Loss 1.0714 LearningRate 0.0018 Epoch: 17 Global Step: 215060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:23:38,663-Speed 3335.45 samples/sec Loss 1.1144 LearningRate 0.0018 Epoch: 17 Global Step: 215070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:23:41,733-Speed 3336.91 samples/sec Loss 1.0531 LearningRate 0.0018 Epoch: 17 Global Step: 215080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:23:44,786-Speed 3354.84 samples/sec Loss 1.0773 LearningRate 0.0018 Epoch: 17 Global Step: 215090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:23:47,873-Speed 3318.60 samples/sec Loss 1.0586 LearningRate 0.0018 Epoch: 17 Global Step: 215100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:23:51,002-Speed 3273.51 samples/sec Loss 1.0410 LearningRate 0.0018 Epoch: 17 Global Step: 215110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:23:54,107-Speed 3299.10 samples/sec Loss 1.0571 LearningRate 0.0018 Epoch: 17 Global Step: 215120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:23:57,223-Speed 3287.48 samples/sec Loss 1.0517 LearningRate 0.0018 Epoch: 17 Global Step: 215130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:24:00,336-Speed 3289.96 samples/sec Loss 1.0540 LearningRate 0.0018 Epoch: 17 Global Step: 215140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:24:03,443-Speed 3297.05 samples/sec Loss 1.0627 LearningRate 0.0018 Epoch: 17 Global Step: 215150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:24:06,521-Speed 3328.09 samples/sec Loss 1.0481 LearningRate 0.0018 Epoch: 17 Global Step: 215160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:24:09,595-Speed 3332.09 samples/sec Loss 1.0497 LearningRate 0.0018 Epoch: 17 Global Step: 215170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:24:12,676-Speed 3324.54 samples/sec Loss 1.0902 LearningRate 0.0018 Epoch: 17 Global Step: 215180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:24:15,770-Speed 3311.34 samples/sec Loss 1.0524 LearningRate 0.0018 Epoch: 17 Global Step: 215190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:24:18,890-Speed 3282.36 samples/sec Loss 1.1048 LearningRate 0.0018 Epoch: 17 Global Step: 215200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:24:21,946-Speed 3352.20 samples/sec Loss 1.0635 LearningRate 0.0018 Epoch: 17 Global Step: 215210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:24:25,016-Speed 3335.99 samples/sec Loss 1.0725 LearningRate 0.0018 Epoch: 17 Global Step: 215220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:24:28,089-Speed 3334.62 samples/sec Loss 1.0574 LearningRate 0.0018 Epoch: 17 Global Step: 215230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:24:31,164-Speed 3331.66 samples/sec Loss 1.0613 LearningRate 0.0018 Epoch: 17 Global Step: 215240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:24:34,222-Speed 3349.25 samples/sec Loss 1.0889 LearningRate 0.0018 Epoch: 17 Global Step: 215250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:24:37,333-Speed 3292.37 samples/sec Loss 1.0869 LearningRate 0.0018 Epoch: 17 Global Step: 215260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:24:40,474-Speed 3261.05 samples/sec Loss 1.0340 LearningRate 0.0018 Epoch: 17 Global Step: 215270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:24:43,604-Speed 3272.30 samples/sec Loss 1.0904 LearningRate 0.0018 Epoch: 17 Global Step: 215280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:24:46,712-Speed 3296.15 samples/sec Loss 1.0650 LearningRate 0.0018 Epoch: 17 Global Step: 215290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:24:49,855-Speed 3258.82 samples/sec Loss 1.1098 LearningRate 0.0018 Epoch: 17 Global Step: 215300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:24:52,925-Speed 3336.42 samples/sec Loss 1.0509 LearningRate 0.0018 Epoch: 17 Global Step: 215310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:24:55,982-Speed 3350.74 samples/sec Loss 1.0529 LearningRate 0.0018 Epoch: 17 Global Step: 215320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:24:59,068-Speed 3319.57 samples/sec Loss 1.0211 LearningRate 0.0018 Epoch: 17 Global Step: 215330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:25:02,148-Speed 3325.47 samples/sec Loss 1.0585 LearningRate 0.0018 Epoch: 17 Global Step: 215340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:25:05,249-Speed 3303.62 samples/sec Loss 1.0617 LearningRate 0.0018 Epoch: 17 Global Step: 215350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:25:08,334-Speed 3320.69 samples/sec Loss 1.1104 LearningRate 0.0018 Epoch: 17 Global Step: 215360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:25:11,390-Speed 3351.99 samples/sec Loss 1.0464 LearningRate 0.0018 Epoch: 17 Global Step: 215370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:25:14,469-Speed 3326.22 samples/sec Loss 1.0635 LearningRate 0.0018 Epoch: 17 Global Step: 215380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:25:17,531-Speed 3345.76 samples/sec Loss 1.0606 LearningRate 0.0018 Epoch: 17 Global Step: 215390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:25:20,609-Speed 3327.74 samples/sec Loss 1.1033 LearningRate 0.0018 Epoch: 17 Global Step: 215400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:25:23,668-Speed 3348.91 samples/sec Loss 1.0824 LearningRate 0.0018 Epoch: 17 Global Step: 215410 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:25:26,806-Speed 3264.16 samples/sec Loss 1.0566 LearningRate 0.0018 Epoch: 17 Global Step: 215420 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:25:29,869-Speed 3344.75 samples/sec Loss 1.0748 LearningRate 0.0018 Epoch: 17 Global Step: 215430 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:25:32,973-Speed 3299.36 samples/sec Loss 1.0834 LearningRate 0.0018 Epoch: 17 Global Step: 215440 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:25:36,168-Speed 3206.14 samples/sec Loss 1.0411 LearningRate 0.0018 Epoch: 17 Global Step: 215450 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:25:39,299-Speed 3271.39 samples/sec Loss 1.0243 LearningRate 0.0018 Epoch: 17 Global Step: 215460 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:25:42,397-Speed 3306.46 samples/sec Loss 1.0174 LearningRate 0.0018 Epoch: 17 Global Step: 215470 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:25:45,486-Speed 3316.66 samples/sec Loss 1.0594 LearningRate 0.0018 Epoch: 17 Global Step: 215480 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:25:48,569-Speed 3322.92 samples/sec Loss 1.0936 LearningRate 0.0018 Epoch: 17 Global Step: 215490 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:25:51,649-Speed 3325.56 samples/sec Loss 1.0340 LearningRate 0.0018 Epoch: 17 Global Step: 215500 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:25:54,765-Speed 3286.63 samples/sec Loss 1.0939 LearningRate 0.0018 Epoch: 17 Global Step: 215510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:25:57,813-Speed 3360.73 samples/sec Loss 1.0999 LearningRate 0.0018 Epoch: 17 Global Step: 215520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:00,896-Speed 3322.32 samples/sec Loss 1.0743 LearningRate 0.0018 Epoch: 17 Global Step: 215530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:03,991-Speed 3309.15 samples/sec Loss 1.0373 LearningRate 0.0018 Epoch: 17 Global Step: 215540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:07,134-Speed 3259.18 samples/sec Loss 1.1084 LearningRate 0.0018 Epoch: 17 Global Step: 215550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:10,204-Speed 3336.69 samples/sec Loss 1.0707 LearningRate 0.0017 Epoch: 17 Global Step: 215560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:13,300-Speed 3308.40 samples/sec Loss 1.0805 LearningRate 0.0017 Epoch: 17 Global Step: 215570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:16,469-Speed 3232.53 samples/sec Loss 1.0595 LearningRate 0.0017 Epoch: 17 Global Step: 215580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:19,556-Speed 3318.62 samples/sec Loss 1.0541 LearningRate 0.0017 Epoch: 17 Global Step: 215590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:22,687-Speed 3271.18 samples/sec Loss 1.0555 LearningRate 0.0017 Epoch: 17 Global Step: 215600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:25,755-Speed 3339.62 samples/sec Loss 1.0672 LearningRate 0.0017 Epoch: 17 Global Step: 215610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:26:28,856-Speed 3303.13 samples/sec Loss 1.0650 LearningRate 0.0017 Epoch: 17 Global Step: 215620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:26:31,943-Speed 3318.29 samples/sec Loss 1.0408 LearningRate 0.0017 Epoch: 17 Global Step: 215630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:34,996-Speed 3355.14 samples/sec Loss 1.1016 LearningRate 0.0017 Epoch: 17 Global Step: 215640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:38,047-Speed 3357.25 samples/sec Loss 1.0939 LearningRate 0.0017 Epoch: 17 Global Step: 215650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:41,185-Speed 3263.92 samples/sec Loss 1.0477 LearningRate 0.0017 Epoch: 17 Global Step: 215660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:44,274-Speed 3315.86 samples/sec Loss 1.0842 LearningRate 0.0017 Epoch: 17 Global Step: 215670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:47,429-Speed 3247.52 samples/sec Loss 1.0798 LearningRate 0.0017 Epoch: 17 Global Step: 215680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:50,500-Speed 3335.34 samples/sec Loss 1.0151 LearningRate 0.0017 Epoch: 17 Global Step: 215690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:53,585-Speed 3320.22 samples/sec Loss 1.0544 LearningRate 0.0017 Epoch: 17 Global Step: 215700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:56,702-Speed 3286.37 samples/sec Loss 1.0955 LearningRate 0.0017 Epoch: 17 Global Step: 215710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:26:59,860-Speed 3243.72 samples/sec Loss 1.0878 LearningRate 0.0017 Epoch: 17 Global Step: 215720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:27:03,029-Speed 3232.77 samples/sec Loss 1.0684 LearningRate 0.0017 Epoch: 17 Global Step: 215730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:27:06,163-Speed 3267.99 samples/sec Loss 1.1064 LearningRate 0.0017 Epoch: 17 Global Step: 215740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:27:09,240-Speed 3329.59 samples/sec Loss 1.1018 LearningRate 0.0017 Epoch: 17 Global Step: 215750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:27:12,373-Speed 3269.22 samples/sec Loss 1.0663 LearningRate 0.0017 Epoch: 17 Global Step: 215760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:27:15,470-Speed 3307.29 samples/sec Loss 1.0233 LearningRate 0.0017 Epoch: 17 Global Step: 215770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:27:18,527-Speed 3351.29 samples/sec Loss 1.0241 LearningRate 0.0017 Epoch: 17 Global Step: 215780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:27:21,561-Speed 3375.23 samples/sec Loss 1.0661 LearningRate 0.0017 Epoch: 17 Global Step: 215790 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:27:24,735-Speed 3227.51 samples/sec Loss 1.0737 LearningRate 0.0017 Epoch: 17 Global Step: 215800 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:27:27,868-Speed 3269.37 samples/sec Loss 1.1193 LearningRate 0.0017 Epoch: 17 Global Step: 215810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:27:30,983-Speed 3288.90 samples/sec Loss 1.0843 LearningRate 0.0017 Epoch: 17 Global Step: 215820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:27:34,082-Speed 3305.26 samples/sec Loss 1.0125 LearningRate 0.0017 Epoch: 17 Global Step: 215830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:27:37,191-Speed 3294.56 samples/sec Loss 1.0904 LearningRate 0.0017 Epoch: 17 Global Step: 215840 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:27:40,267-Speed 3331.06 samples/sec Loss 1.0437 LearningRate 0.0017 Epoch: 17 Global Step: 215850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:27:43,448-Speed 3219.57 samples/sec Loss 1.0629 LearningRate 0.0017 Epoch: 17 Global Step: 215860 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:27:46,547-Speed 3305.74 samples/sec Loss 1.0491 LearningRate 0.0017 Epoch: 17 Global Step: 215870 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:27:49,664-Speed 3286.18 samples/sec Loss 1.0583 LearningRate 0.0017 Epoch: 17 Global Step: 215880 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:27:52,737-Speed 3333.66 samples/sec Loss 1.0670 LearningRate 0.0017 Epoch: 17 Global Step: 215890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:27:55,799-Speed 3344.90 samples/sec Loss 1.0901 LearningRate 0.0017 Epoch: 17 Global Step: 215900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:27:58,878-Speed 3326.61 samples/sec Loss 1.0908 LearningRate 0.0017 Epoch: 17 Global Step: 215910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:28:01,988-Speed 3294.51 samples/sec Loss 1.0474 LearningRate 0.0017 Epoch: 17 Global Step: 215920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:28:05,056-Speed 3338.96 samples/sec Loss 1.0263 LearningRate 0.0017 Epoch: 17 Global Step: 215930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:28:08,113-Speed 3350.04 samples/sec Loss 1.0825 LearningRate 0.0017 Epoch: 17 Global Step: 215940 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:28:11,270-Speed 3244.89 samples/sec Loss 1.0470 LearningRate 0.0017 Epoch: 17 Global Step: 215950 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:28:14,400-Speed 3273.10 samples/sec Loss 1.0707 LearningRate 0.0017 Epoch: 17 Global Step: 215960 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:28:17,498-Speed 3306.74 samples/sec Loss 1.0993 LearningRate 0.0017 Epoch: 17 Global Step: 215970 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:28:20,568-Speed 3336.34 samples/sec Loss 1.0652 LearningRate 0.0017 Epoch: 17 Global Step: 215980 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:28:23,754-Speed 3214.27 samples/sec Loss 1.0506 LearningRate 0.0017 Epoch: 17 Global Step: 215990 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:28:26,831-Speed 3329.29 samples/sec Loss 1.0844 LearningRate 0.0017 Epoch: 17 Global Step: 216000 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:28:29,915-Speed 3321.93 samples/sec Loss 1.0737 LearningRate 0.0017 Epoch: 17 Global Step: 216010 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:28:32,979-Speed 3342.98 samples/sec Loss 1.0459 LearningRate 0.0017 Epoch: 17 Global Step: 216020 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:28:36,090-Speed 3293.35 samples/sec Loss 1.0648 LearningRate 0.0017 Epoch: 17 Global Step: 216030 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:28:39,170-Speed 3325.59 samples/sec Loss 1.1125 LearningRate 0.0017 Epoch: 17 Global Step: 216040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:28:42,224-Speed 3353.06 samples/sec Loss 1.0546 LearningRate 0.0017 Epoch: 17 Global Step: 216050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:28:45,323-Speed 3305.67 samples/sec Loss 1.0598 LearningRate 0.0017 Epoch: 17 Global Step: 216060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:28:48,462-Speed 3262.83 samples/sec Loss 1.0393 LearningRate 0.0017 Epoch: 17 Global Step: 216070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:28:51,653-Speed 3210.30 samples/sec Loss 1.0614 LearningRate 0.0017 Epoch: 17 Global Step: 216080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:28:54,717-Speed 3342.92 samples/sec Loss 1.0449 LearningRate 0.0017 Epoch: 17 Global Step: 216090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:28:57,831-Speed 3290.28 samples/sec Loss 1.0200 LearningRate 0.0017 Epoch: 17 Global Step: 216100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:29:00,931-Speed 3303.42 samples/sec Loss 1.0190 LearningRate 0.0017 Epoch: 17 Global Step: 216110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:29:04,007-Speed 3330.07 samples/sec Loss 1.0959 LearningRate 0.0017 Epoch: 17 Global Step: 216120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:29:07,092-Speed 3320.89 samples/sec Loss 1.0951 LearningRate 0.0017 Epoch: 17 Global Step: 216130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:29:10,162-Speed 3336.45 samples/sec Loss 1.0593 LearningRate 0.0017 Epoch: 17 Global Step: 216140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:29:13,239-Speed 3328.62 samples/sec Loss 1.0798 LearningRate 0.0017 Epoch: 17 Global Step: 216150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:29:16,319-Speed 3325.40 samples/sec Loss 1.0193 LearningRate 0.0017 Epoch: 17 Global Step: 216160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:29:19,395-Speed 3330.36 samples/sec Loss 1.0689 LearningRate 0.0017 Epoch: 17 Global Step: 216170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:29:22,478-Speed 3322.29 samples/sec Loss 1.0769 LearningRate 0.0017 Epoch: 17 Global Step: 216180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:29:25,652-Speed 3227.64 samples/sec Loss 1.0586 LearningRate 0.0017 Epoch: 17 Global Step: 216190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:29:28,805-Speed 3248.55 samples/sec Loss 1.0574 LearningRate 0.0017 Epoch: 17 Global Step: 216200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:29:31,893-Speed 3317.81 samples/sec Loss 1.0304 LearningRate 0.0017 Epoch: 17 Global Step: 216210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:29:35,028-Speed 3266.64 samples/sec Loss 1.0296 LearningRate 0.0017 Epoch: 17 Global Step: 216220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:29:38,132-Speed 3300.41 samples/sec Loss 1.0702 LearningRate 0.0017 Epoch: 17 Global Step: 216230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:29:41,294-Speed 3239.71 samples/sec Loss 1.0933 LearningRate 0.0017 Epoch: 17 Global Step: 216240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:29:44,366-Speed 3334.59 samples/sec Loss 1.0580 LearningRate 0.0017 Epoch: 17 Global Step: 216250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:29:47,454-Speed 3317.84 samples/sec Loss 1.0627 LearningRate 0.0017 Epoch: 17 Global Step: 216260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:29:50,506-Speed 3356.03 samples/sec Loss 1.0970 LearningRate 0.0017 Epoch: 17 Global Step: 216270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:29:53,603-Speed 3308.03 samples/sec Loss 1.0357 LearningRate 0.0017 Epoch: 17 Global Step: 216280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:29:56,735-Speed 3270.25 samples/sec Loss 1.0211 LearningRate 0.0017 Epoch: 17 Global Step: 216290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:29:59,862-Speed 3276.22 samples/sec Loss 1.0872 LearningRate 0.0017 Epoch: 17 Global Step: 216300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:03,005-Speed 3258.70 samples/sec Loss 1.0714 LearningRate 0.0017 Epoch: 17 Global Step: 216310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:06,173-Speed 3233.82 samples/sec Loss 1.0598 LearningRate 0.0017 Epoch: 17 Global Step: 216320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:09,263-Speed 3313.79 samples/sec Loss 1.1064 LearningRate 0.0017 Epoch: 17 Global Step: 216330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:30:12,409-Speed 3256.39 samples/sec Loss 1.1081 LearningRate 0.0017 Epoch: 17 Global Step: 216340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:30:15,510-Speed 3303.42 samples/sec Loss 1.0349 LearningRate 0.0017 Epoch: 17 Global Step: 216350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:18,593-Speed 3321.94 samples/sec Loss 1.0370 LearningRate 0.0017 Epoch: 17 Global Step: 216360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:21,662-Speed 3337.41 samples/sec Loss 1.0265 LearningRate 0.0017 Epoch: 17 Global Step: 216370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:24,758-Speed 3308.61 samples/sec Loss 1.0898 LearningRate 0.0017 Epoch: 17 Global Step: 216380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:27,849-Speed 3313.87 samples/sec Loss 1.0509 LearningRate 0.0017 Epoch: 17 Global Step: 216390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:30,968-Speed 3284.27 samples/sec Loss 1.0868 LearningRate 0.0017 Epoch: 17 Global Step: 216400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:34,039-Speed 3335.95 samples/sec Loss 1.0574 LearningRate 0.0017 Epoch: 17 Global Step: 216410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:37,139-Speed 3304.30 samples/sec Loss 1.0992 LearningRate 0.0017 Epoch: 17 Global Step: 216420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:40,260-Speed 3281.51 samples/sec Loss 1.0866 LearningRate 0.0017 Epoch: 17 Global Step: 216430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:43,361-Speed 3303.59 samples/sec Loss 1.1031 LearningRate 0.0017 Epoch: 17 Global Step: 216440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:46,496-Speed 3267.33 samples/sec Loss 1.0546 LearningRate 0.0017 Epoch: 17 Global Step: 216450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:30:49,635-Speed 3263.17 samples/sec Loss 1.0390 LearningRate 0.0017 Epoch: 17 Global Step: 216460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:52,769-Speed 3268.17 samples/sec Loss 1.0369 LearningRate 0.0017 Epoch: 17 Global Step: 216470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:55,894-Speed 3277.86 samples/sec Loss 1.0452 LearningRate 0.0017 Epoch: 17 Global Step: 216480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:30:58,998-Speed 3300.40 samples/sec Loss 1.0732 LearningRate 0.0017 Epoch: 17 Global Step: 216490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:31:02,113-Speed 3288.35 samples/sec Loss 1.0653 LearningRate 0.0017 Epoch: 17 Global Step: 216500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:31:05,265-Speed 3249.87 samples/sec Loss 1.0336 LearningRate 0.0016 Epoch: 17 Global Step: 216510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:31:08,329-Speed 3342.58 samples/sec Loss 1.0606 LearningRate 0.0016 Epoch: 17 Global Step: 216520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:31:11,419-Speed 3315.34 samples/sec Loss 1.0521 LearningRate 0.0016 Epoch: 17 Global Step: 216530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:31:14,532-Speed 3290.75 samples/sec Loss 1.0490 LearningRate 0.0016 Epoch: 17 Global Step: 216540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:31:17,648-Speed 3292.39 samples/sec Loss 1.0649 LearningRate 0.0016 Epoch: 17 Global Step: 216550 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:31:20,734-Speed 3318.71 samples/sec Loss 1.0610 LearningRate 0.0016 Epoch: 17 Global Step: 216560 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:31:23,843-Speed 3295.21 samples/sec Loss 1.1080 LearningRate 0.0016 Epoch: 17 Global Step: 216570 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:31:26,988-Speed 3257.02 samples/sec Loss 1.0620 LearningRate 0.0016 Epoch: 17 Global Step: 216580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:31:30,091-Speed 3301.02 samples/sec Loss 1.0566 LearningRate 0.0016 Epoch: 17 Global Step: 216590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:31:33,178-Speed 3318.71 samples/sec Loss 1.1078 LearningRate 0.0016 Epoch: 17 Global Step: 216600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:31:36,280-Speed 3301.38 samples/sec Loss 1.0816 LearningRate 0.0016 Epoch: 17 Global Step: 216610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:31:39,428-Speed 3253.67 samples/sec Loss 1.0581 LearningRate 0.0016 Epoch: 17 Global Step: 216620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:31:42,510-Speed 3323.96 samples/sec Loss 1.0640 LearningRate 0.0016 Epoch: 17 Global Step: 216630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:31:45,617-Speed 3296.63 samples/sec Loss 1.0821 LearningRate 0.0016 Epoch: 17 Global Step: 216640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:31:48,786-Speed 3232.73 samples/sec Loss 1.0759 LearningRate 0.0016 Epoch: 17 Global Step: 216650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:31:51,888-Speed 3301.40 samples/sec Loss 1.0725 LearningRate 0.0016 Epoch: 17 Global Step: 216660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:31:54,994-Speed 3298.31 samples/sec Loss 1.1060 LearningRate 0.0016 Epoch: 17 Global Step: 216670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:31:58,085-Speed 3313.68 samples/sec Loss 1.0781 LearningRate 0.0016 Epoch: 17 Global Step: 216680 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:32:01,172-Speed 3318.41 samples/sec Loss 1.0583 LearningRate 0.0016 Epoch: 17 Global Step: 216690 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:32:04,231-Speed 3348.88 samples/sec Loss 1.0823 LearningRate 0.0016 Epoch: 17 Global Step: 216700 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 20:32:07,328-Speed 3306.90 samples/sec Loss 1.0765 LearningRate 0.0016 Epoch: 17 Global Step: 216710 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 20:32:10,420-Speed 3313.29 samples/sec Loss 1.0986 LearningRate 0.0016 Epoch: 17 Global Step: 216720 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 20:32:13,579-Speed 3242.21 samples/sec Loss 1.0774 LearningRate 0.0016 Epoch: 17 Global Step: 216730 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 20:32:16,722-Speed 3259.43 samples/sec Loss 1.0226 LearningRate 0.0016 Epoch: 17 Global Step: 216740 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 20:32:19,853-Speed 3271.03 samples/sec Loss 1.0874 LearningRate 0.0016 Epoch: 17 Global Step: 216750 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 20:32:22,942-Speed 3316.95 samples/sec Loss 1.0522 LearningRate 0.0016 Epoch: 17 Global Step: 216760 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 20:32:26,064-Speed 3280.63 samples/sec Loss 1.0450 LearningRate 0.0016 Epoch: 17 Global Step: 216770 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 20:32:29,155-Speed 3314.03 samples/sec Loss 1.0412 LearningRate 0.0016 Epoch: 17 Global Step: 216780 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 20:32:32,241-Speed 3319.03 samples/sec Loss 1.0580 LearningRate 0.0016 Epoch: 17 Global Step: 216790 Fp16 Grad Scale: 4096 Required: 3 hours Training: 2022-04-27 20:32:35,343-Speed 3302.69 samples/sec Loss 1.0857 LearningRate 0.0016 Epoch: 17 Global Step: 216800 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:32:38,426-Speed 3322.45 samples/sec Loss 1.0494 LearningRate 0.0016 Epoch: 17 Global Step: 216810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:32:41,503-Speed 3329.03 samples/sec Loss 1.0458 LearningRate 0.0016 Epoch: 17 Global Step: 216820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:32:44,606-Speed 3301.10 samples/sec Loss 1.0355 LearningRate 0.0016 Epoch: 17 Global Step: 216830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:32:47,734-Speed 3274.29 samples/sec Loss 1.0483 LearningRate 0.0016 Epoch: 17 Global Step: 216840 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:32:50,813-Speed 3326.87 samples/sec Loss 1.0305 LearningRate 0.0016 Epoch: 17 Global Step: 216850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:32:53,887-Speed 3332.75 samples/sec Loss 1.1088 LearningRate 0.0016 Epoch: 17 Global Step: 216860 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:32:56,964-Speed 3328.35 samples/sec Loss 1.1114 LearningRate 0.0016 Epoch: 17 Global Step: 216870 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:33:00,083-Speed 3284.81 samples/sec Loss 1.0399 LearningRate 0.0016 Epoch: 17 Global Step: 216880 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:33:03,167-Speed 3321.06 samples/sec Loss 1.0396 LearningRate 0.0016 Epoch: 17 Global Step: 216890 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:33:06,294-Speed 3275.97 samples/sec Loss 1.0871 LearningRate 0.0016 Epoch: 17 Global Step: 216900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:09,364-Speed 3336.61 samples/sec Loss 1.0580 LearningRate 0.0016 Epoch: 17 Global Step: 216910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:12,456-Speed 3313.76 samples/sec Loss 1.0501 LearningRate 0.0016 Epoch: 17 Global Step: 216920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:15,613-Speed 3244.28 samples/sec Loss 1.0601 LearningRate 0.0016 Epoch: 17 Global Step: 216930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:18,757-Speed 3257.51 samples/sec Loss 1.0592 LearningRate 0.0016 Epoch: 17 Global Step: 216940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:21,852-Speed 3310.08 samples/sec Loss 1.0546 LearningRate 0.0016 Epoch: 17 Global Step: 216950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:24,985-Speed 3269.10 samples/sec Loss 1.0631 LearningRate 0.0016 Epoch: 17 Global Step: 216960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:28,054-Speed 3337.51 samples/sec Loss 1.1258 LearningRate 0.0016 Epoch: 17 Global Step: 216970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:31,193-Speed 3262.39 samples/sec Loss 1.0528 LearningRate 0.0016 Epoch: 17 Global Step: 216980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:34,285-Speed 3313.27 samples/sec Loss 1.0718 LearningRate 0.0016 Epoch: 17 Global Step: 216990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:37,434-Speed 3252.88 samples/sec Loss 1.0410 LearningRate 0.0016 Epoch: 17 Global Step: 217000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:33:40,574-Speed 3262.55 samples/sec Loss 1.0868 LearningRate 0.0016 Epoch: 17 Global Step: 217010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:33:43,644-Speed 3336.93 samples/sec Loss 1.0977 LearningRate 0.0016 Epoch: 17 Global Step: 217020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:46,774-Speed 3272.13 samples/sec Loss 1.0620 LearningRate 0.0016 Epoch: 17 Global Step: 217030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:49,885-Speed 3291.80 samples/sec Loss 1.0852 LearningRate 0.0016 Epoch: 17 Global Step: 217040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:53,009-Speed 3279.48 samples/sec Loss 1.0929 LearningRate 0.0016 Epoch: 17 Global Step: 217050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:56,171-Speed 3239.14 samples/sec Loss 1.0901 LearningRate 0.0016 Epoch: 17 Global Step: 217060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:33:59,290-Speed 3283.90 samples/sec Loss 1.0493 LearningRate 0.0016 Epoch: 17 Global Step: 217070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:34:02,520-Speed 3171.07 samples/sec Loss 1.0599 LearningRate 0.0016 Epoch: 17 Global Step: 217080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:34:05,602-Speed 3324.12 samples/sec Loss 1.0745 LearningRate 0.0016 Epoch: 17 Global Step: 217090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:34:08,720-Speed 3284.48 samples/sec Loss 1.0634 LearningRate 0.0016 Epoch: 17 Global Step: 217100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:34:11,826-Speed 3297.78 samples/sec Loss 1.0746 LearningRate 0.0016 Epoch: 17 Global Step: 217110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:34:14,978-Speed 3250.37 samples/sec Loss 1.0946 LearningRate 0.0016 Epoch: 17 Global Step: 217120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:34:18,157-Speed 3222.36 samples/sec Loss 1.0273 LearningRate 0.0016 Epoch: 17 Global Step: 217130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:34:21,255-Speed 3306.00 samples/sec Loss 1.0572 LearningRate 0.0016 Epoch: 17 Global Step: 217140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:34:24,346-Speed 3313.17 samples/sec Loss 1.1000 LearningRate 0.0016 Epoch: 17 Global Step: 217150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:34:27,546-Speed 3201.73 samples/sec Loss 1.0681 LearningRate 0.0016 Epoch: 17 Global Step: 217160 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:34:30,618-Speed 3334.80 samples/sec Loss 1.0362 LearningRate 0.0016 Epoch: 17 Global Step: 217170 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:34:33,720-Speed 3302.10 samples/sec Loss 1.0521 LearningRate 0.0016 Epoch: 17 Global Step: 217180 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:34:36,876-Speed 3245.09 samples/sec Loss 1.0528 LearningRate 0.0016 Epoch: 17 Global Step: 217190 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:34:39,991-Speed 3289.01 samples/sec Loss 0.9864 LearningRate 0.0016 Epoch: 17 Global Step: 217200 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:34:43,142-Speed 3250.61 samples/sec Loss 1.0661 LearningRate 0.0016 Epoch: 17 Global Step: 217210 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:34:46,232-Speed 3315.13 samples/sec Loss 1.0734 LearningRate 0.0016 Epoch: 17 Global Step: 217220 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:34:49,327-Speed 3308.74 samples/sec Loss 1.0127 LearningRate 0.0016 Epoch: 17 Global Step: 217230 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:34:52,514-Speed 3214.40 samples/sec Loss 1.0535 LearningRate 0.0016 Epoch: 17 Global Step: 217240 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:34:55,608-Speed 3311.43 samples/sec Loss 1.0689 LearningRate 0.0016 Epoch: 17 Global Step: 217250 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:34:58,718-Speed 3293.18 samples/sec Loss 1.1043 LearningRate 0.0016 Epoch: 17 Global Step: 217260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:01,882-Speed 3237.62 samples/sec Loss 1.0240 LearningRate 0.0016 Epoch: 17 Global Step: 217270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:05,043-Speed 3240.00 samples/sec Loss 1.0764 LearningRate 0.0016 Epoch: 17 Global Step: 217280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:08,209-Speed 3235.33 samples/sec Loss 1.0673 LearningRate 0.0016 Epoch: 17 Global Step: 217290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:11,332-Speed 3280.05 samples/sec Loss 1.0830 LearningRate 0.0016 Epoch: 17 Global Step: 217300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:14,458-Speed 3277.45 samples/sec Loss 1.0402 LearningRate 0.0016 Epoch: 17 Global Step: 217310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:17,566-Speed 3295.34 samples/sec Loss 1.0419 LearningRate 0.0016 Epoch: 17 Global Step: 217320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:20,630-Speed 3343.95 samples/sec Loss 1.0801 LearningRate 0.0016 Epoch: 17 Global Step: 217330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:23,729-Speed 3305.30 samples/sec Loss 1.1005 LearningRate 0.0016 Epoch: 17 Global Step: 217340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:26,854-Speed 3277.66 samples/sec Loss 1.0841 LearningRate 0.0016 Epoch: 17 Global Step: 217350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:29,918-Speed 3343.34 samples/sec Loss 1.0798 LearningRate 0.0016 Epoch: 17 Global Step: 217360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:35:32,954-Speed 3373.00 samples/sec Loss 1.0185 LearningRate 0.0016 Epoch: 17 Global Step: 217370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:36,139-Speed 3216.11 samples/sec Loss 1.0938 LearningRate 0.0016 Epoch: 17 Global Step: 217380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:39,313-Speed 3227.13 samples/sec Loss 1.0633 LearningRate 0.0016 Epoch: 17 Global Step: 217390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:42,460-Speed 3255.45 samples/sec Loss 1.0768 LearningRate 0.0016 Epoch: 17 Global Step: 217400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:45,574-Speed 3289.09 samples/sec Loss 1.0442 LearningRate 0.0016 Epoch: 17 Global Step: 217410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:48,732-Speed 3243.74 samples/sec Loss 1.0775 LearningRate 0.0016 Epoch: 17 Global Step: 217420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:51,865-Speed 3269.54 samples/sec Loss 1.0109 LearningRate 0.0016 Epoch: 17 Global Step: 217430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:54,944-Speed 3326.78 samples/sec Loss 1.0109 LearningRate 0.0016 Epoch: 17 Global Step: 217440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:35:58,007-Speed 3344.04 samples/sec Loss 1.0698 LearningRate 0.0016 Epoch: 17 Global Step: 217450 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:36:01,085-Speed 3328.33 samples/sec Loss 1.0544 LearningRate 0.0016 Epoch: 17 Global Step: 217460 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:36:04,243-Speed 3243.58 samples/sec Loss 1.0536 LearningRate 0.0016 Epoch: 17 Global Step: 217470 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:36:07,408-Speed 3236.13 samples/sec Loss 1.0749 LearningRate 0.0016 Epoch: 17 Global Step: 217480 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:36:10,530-Speed 3281.04 samples/sec Loss 1.0473 LearningRate 0.0016 Epoch: 17 Global Step: 217490 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:36:13,690-Speed 3242.07 samples/sec Loss 1.0218 LearningRate 0.0015 Epoch: 17 Global Step: 217500 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:36:16,788-Speed 3306.42 samples/sec Loss 1.0585 LearningRate 0.0015 Epoch: 17 Global Step: 217510 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:36:19,925-Speed 3265.35 samples/sec Loss 1.0654 LearningRate 0.0015 Epoch: 17 Global Step: 217520 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:36:23,010-Speed 3319.93 samples/sec Loss 1.0869 LearningRate 0.0015 Epoch: 17 Global Step: 217530 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:36:26,101-Speed 3314.39 samples/sec Loss 1.0405 LearningRate 0.0015 Epoch: 17 Global Step: 217540 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:36:29,301-Speed 3200.87 samples/sec Loss 1.0555 LearningRate 0.0015 Epoch: 17 Global Step: 217550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:36:32,398-Speed 3307.39 samples/sec Loss 1.0656 LearningRate 0.0015 Epoch: 17 Global Step: 217560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:36:35,516-Speed 3285.01 samples/sec Loss 1.0390 LearningRate 0.0015 Epoch: 17 Global Step: 217570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:36:38,645-Speed 3272.95 samples/sec Loss 1.0162 LearningRate 0.0015 Epoch: 17 Global Step: 217580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:36:41,770-Speed 3278.13 samples/sec Loss 1.0280 LearningRate 0.0015 Epoch: 17 Global Step: 217590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:36:44,861-Speed 3314.89 samples/sec Loss 1.0744 LearningRate 0.0015 Epoch: 17 Global Step: 217600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:36:48,010-Speed 3252.75 samples/sec Loss 1.0340 LearningRate 0.0015 Epoch: 17 Global Step: 217610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:36:51,190-Speed 3220.24 samples/sec Loss 1.0482 LearningRate 0.0015 Epoch: 17 Global Step: 217620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:36:54,318-Speed 3275.40 samples/sec Loss 1.1041 LearningRate 0.0015 Epoch: 17 Global Step: 217630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:36:57,425-Speed 3296.99 samples/sec Loss 1.0714 LearningRate 0.0015 Epoch: 17 Global Step: 217640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:00,556-Speed 3270.66 samples/sec Loss 1.0555 LearningRate 0.0015 Epoch: 17 Global Step: 217650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:37:03,723-Speed 3234.96 samples/sec Loss 1.0655 LearningRate 0.0015 Epoch: 17 Global Step: 217660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:37:06,873-Speed 3251.19 samples/sec Loss 1.0998 LearningRate 0.0015 Epoch: 17 Global Step: 217670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:37:09,966-Speed 3311.98 samples/sec Loss 1.0987 LearningRate 0.0015 Epoch: 17 Global Step: 217680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:37:13,106-Speed 3262.27 samples/sec Loss 1.0522 LearningRate 0.0015 Epoch: 17 Global Step: 217690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:37:16,218-Speed 3291.29 samples/sec Loss 1.0661 LearningRate 0.0015 Epoch: 17 Global Step: 217700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:37:19,339-Speed 3282.20 samples/sec Loss 1.0693 LearningRate 0.0015 Epoch: 17 Global Step: 217710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:22,495-Speed 3245.77 samples/sec Loss 1.0324 LearningRate 0.0015 Epoch: 17 Global Step: 217720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:25,576-Speed 3324.72 samples/sec Loss 1.0334 LearningRate 0.0015 Epoch: 17 Global Step: 217730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:28,655-Speed 3326.69 samples/sec Loss 1.0439 LearningRate 0.0015 Epoch: 17 Global Step: 217740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:31,776-Speed 3282.02 samples/sec Loss 1.1175 LearningRate 0.0015 Epoch: 17 Global Step: 217750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:34,889-Speed 3290.23 samples/sec Loss 1.0748 LearningRate 0.0015 Epoch: 17 Global Step: 217760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:38,009-Speed 3283.48 samples/sec Loss 1.0997 LearningRate 0.0015 Epoch: 17 Global Step: 217770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:41,141-Speed 3270.33 samples/sec Loss 1.0723 LearningRate 0.0015 Epoch: 17 Global Step: 217780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:44,282-Speed 3261.10 samples/sec Loss 1.0706 LearningRate 0.0015 Epoch: 17 Global Step: 217790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:47,351-Speed 3337.41 samples/sec Loss 1.0592 LearningRate 0.0015 Epoch: 17 Global Step: 217800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:50,506-Speed 3246.29 samples/sec Loss 1.0467 LearningRate 0.0015 Epoch: 17 Global Step: 217810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:37:53,631-Speed 3278.57 samples/sec Loss 1.0702 LearningRate 0.0015 Epoch: 17 Global Step: 217820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:37:56,685-Speed 3354.55 samples/sec Loss 1.1117 LearningRate 0.0015 Epoch: 17 Global Step: 217830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:37:59,754-Speed 3336.78 samples/sec Loss 1.0351 LearningRate 0.0015 Epoch: 17 Global Step: 217840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:38:02,901-Speed 3254.88 samples/sec Loss 1.0658 LearningRate 0.0015 Epoch: 17 Global Step: 217850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:38:06,049-Speed 3254.53 samples/sec Loss 1.0571 LearningRate 0.0015 Epoch: 17 Global Step: 217860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:38:09,121-Speed 3333.44 samples/sec Loss 1.0246 LearningRate 0.0015 Epoch: 17 Global Step: 217870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:38:12,237-Speed 3287.68 samples/sec Loss 1.0581 LearningRate 0.0015 Epoch: 17 Global Step: 217880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:38:15,313-Speed 3330.50 samples/sec Loss 1.0708 LearningRate 0.0015 Epoch: 17 Global Step: 217890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:38:18,470-Speed 3243.91 samples/sec Loss 1.0386 LearningRate 0.0015 Epoch: 17 Global Step: 217900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:38:21,586-Speed 3287.79 samples/sec Loss 1.0725 LearningRate 0.0015 Epoch: 17 Global Step: 217910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:38:24,661-Speed 3331.44 samples/sec Loss 1.0815 LearningRate 0.0015 Epoch: 17 Global Step: 217920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:38:27,759-Speed 3306.34 samples/sec Loss 1.0302 LearningRate 0.0015 Epoch: 17 Global Step: 217930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:38:30,877-Speed 3284.44 samples/sec Loss 1.0494 LearningRate 0.0015 Epoch: 17 Global Step: 217940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:38:33,980-Speed 3301.48 samples/sec Loss 1.0885 LearningRate 0.0015 Epoch: 17 Global Step: 217950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:38:37,084-Speed 3299.94 samples/sec Loss 1.0909 LearningRate 0.0015 Epoch: 17 Global Step: 217960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:38:40,178-Speed 3310.51 samples/sec Loss 1.0237 LearningRate 0.0015 Epoch: 17 Global Step: 217970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:38:43,296-Speed 3285.06 samples/sec Loss 1.0814 LearningRate 0.0015 Epoch: 17 Global Step: 217980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:38:46,343-Speed 3361.76 samples/sec Loss 1.0688 LearningRate 0.0015 Epoch: 17 Global Step: 217990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:38:49,436-Speed 3312.39 samples/sec Loss 1.0541 LearningRate 0.0015 Epoch: 17 Global Step: 218000 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:38:52,543-Speed 3297.11 samples/sec Loss 1.0258 LearningRate 0.0015 Epoch: 17 Global Step: 218010 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:38:55,657-Speed 3289.10 samples/sec Loss 1.0685 LearningRate 0.0015 Epoch: 17 Global Step: 218020 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:38:58,830-Speed 3227.58 samples/sec Loss 1.0755 LearningRate 0.0015 Epoch: 17 Global Step: 218030 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:01,956-Speed 3276.93 samples/sec Loss 1.0553 LearningRate 0.0015 Epoch: 17 Global Step: 218040 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:05,088-Speed 3271.03 samples/sec Loss 1.0246 LearningRate 0.0015 Epoch: 17 Global Step: 218050 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:08,167-Speed 3326.10 samples/sec Loss 1.0607 LearningRate 0.0015 Epoch: 17 Global Step: 218060 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:11,265-Speed 3306.35 samples/sec Loss 1.0546 LearningRate 0.0015 Epoch: 17 Global Step: 218070 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:14,392-Speed 3276.90 samples/sec Loss 1.1067 LearningRate 0.0015 Epoch: 17 Global Step: 218080 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:17,472-Speed 3325.66 samples/sec Loss 1.0704 LearningRate 0.0015 Epoch: 17 Global Step: 218090 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:20,522-Speed 3358.22 samples/sec Loss 1.0514 LearningRate 0.0015 Epoch: 17 Global Step: 218100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:39:23,625-Speed 3300.99 samples/sec Loss 1.0715 LearningRate 0.0015 Epoch: 17 Global Step: 218110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:39:26,773-Speed 3253.90 samples/sec Loss 1.0541 LearningRate 0.0015 Epoch: 17 Global Step: 218120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:39:29,944-Speed 3230.44 samples/sec Loss 1.0379 LearningRate 0.0015 Epoch: 17 Global Step: 218130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:39:33,012-Speed 3338.25 samples/sec Loss 1.0588 LearningRate 0.0015 Epoch: 17 Global Step: 218140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:39:36,099-Speed 3318.56 samples/sec Loss 1.0778 LearningRate 0.0015 Epoch: 17 Global Step: 218150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:39:39,160-Speed 3346.22 samples/sec Loss 1.0219 LearningRate 0.0015 Epoch: 17 Global Step: 218160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:39:42,242-Speed 3323.98 samples/sec Loss 1.0494 LearningRate 0.0015 Epoch: 17 Global Step: 218170 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:45,309-Speed 3340.23 samples/sec Loss 1.0598 LearningRate 0.0015 Epoch: 17 Global Step: 218180 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:48,413-Speed 3299.89 samples/sec Loss 1.0662 LearningRate 0.0015 Epoch: 17 Global Step: 218190 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:51,497-Speed 3321.08 samples/sec Loss 1.0239 LearningRate 0.0015 Epoch: 17 Global Step: 218200 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:54,578-Speed 3325.49 samples/sec Loss 1.0481 LearningRate 0.0015 Epoch: 17 Global Step: 218210 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:39:57,629-Speed 3356.36 samples/sec Loss 1.0552 LearningRate 0.0015 Epoch: 17 Global Step: 218220 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:40:00,738-Speed 3295.17 samples/sec Loss 1.0333 LearningRate 0.0015 Epoch: 17 Global Step: 218230 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:40:03,830-Speed 3312.22 samples/sec Loss 1.0928 LearningRate 0.0015 Epoch: 17 Global Step: 218240 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:40:06,916-Speed 3320.33 samples/sec Loss 1.0754 LearningRate 0.0015 Epoch: 17 Global Step: 218250 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:40:09,973-Speed 3349.90 samples/sec Loss 1.0179 LearningRate 0.0015 Epoch: 17 Global Step: 218260 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:40:13,140-Speed 3234.88 samples/sec Loss 1.0443 LearningRate 0.0015 Epoch: 17 Global Step: 218270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:40:16,224-Speed 3321.20 samples/sec Loss 1.0968 LearningRate 0.0015 Epoch: 17 Global Step: 218280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:40:19,297-Speed 3333.59 samples/sec Loss 1.0376 LearningRate 0.0015 Epoch: 17 Global Step: 218290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:40:22,381-Speed 3320.57 samples/sec Loss 1.0910 LearningRate 0.0015 Epoch: 17 Global Step: 218300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:40:25,456-Speed 3331.89 samples/sec Loss 1.0788 LearningRate 0.0015 Epoch: 17 Global Step: 218310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:40:28,584-Speed 3274.52 samples/sec Loss 1.0316 LearningRate 0.0015 Epoch: 17 Global Step: 218320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:40:31,658-Speed 3331.67 samples/sec Loss 1.0306 LearningRate 0.0015 Epoch: 17 Global Step: 218330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:40:34,728-Speed 3336.81 samples/sec Loss 1.0815 LearningRate 0.0015 Epoch: 17 Global Step: 218340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:40:37,809-Speed 3324.45 samples/sec Loss 1.0580 LearningRate 0.0015 Epoch: 17 Global Step: 218350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:40:40,863-Speed 3353.97 samples/sec Loss 1.0802 LearningRate 0.0015 Epoch: 17 Global Step: 218360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:40:43,959-Speed 3308.76 samples/sec Loss 1.0550 LearningRate 0.0015 Epoch: 17 Global Step: 218370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:40:47,059-Speed 3305.03 samples/sec Loss 1.0409 LearningRate 0.0015 Epoch: 17 Global Step: 218380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:40:50,165-Speed 3296.99 samples/sec Loss 1.1055 LearningRate 0.0015 Epoch: 17 Global Step: 218390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:40:53,313-Speed 3255.13 samples/sec Loss 1.0763 LearningRate 0.0015 Epoch: 17 Global Step: 218400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:40:56,355-Speed 3367.23 samples/sec Loss 1.0937 LearningRate 0.0015 Epoch: 17 Global Step: 218410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:40:59,426-Speed 3335.80 samples/sec Loss 1.0857 LearningRate 0.0015 Epoch: 17 Global Step: 218420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:41:02,561-Speed 3266.40 samples/sec Loss 1.0783 LearningRate 0.0015 Epoch: 17 Global Step: 218430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:41:05,711-Speed 3252.43 samples/sec Loss 1.1050 LearningRate 0.0015 Epoch: 17 Global Step: 218440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:41:08,752-Speed 3368.02 samples/sec Loss 1.0333 LearningRate 0.0015 Epoch: 17 Global Step: 218450 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:11,914-Speed 3239.02 samples/sec Loss 1.0376 LearningRate 0.0015 Epoch: 17 Global Step: 218460 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:15,097-Speed 3218.43 samples/sec Loss 1.0507 LearningRate 0.0015 Epoch: 17 Global Step: 218470 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:18,248-Speed 3250.41 samples/sec Loss 1.0309 LearningRate 0.0015 Epoch: 17 Global Step: 218480 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:21,322-Speed 3332.74 samples/sec Loss 1.0805 LearningRate 0.0015 Epoch: 17 Global Step: 218490 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:24,465-Speed 3259.18 samples/sec Loss 1.0472 LearningRate 0.0015 Epoch: 17 Global Step: 218500 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:27,612-Speed 3254.35 samples/sec Loss 1.0993 LearningRate 0.0014 Epoch: 17 Global Step: 218510 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:30,723-Speed 3293.08 samples/sec Loss 1.0417 LearningRate 0.0014 Epoch: 17 Global Step: 218520 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:33,786-Speed 3344.41 samples/sec Loss 0.9842 LearningRate 0.0014 Epoch: 17 Global Step: 218530 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:36,938-Speed 3249.95 samples/sec Loss 1.0458 LearningRate 0.0014 Epoch: 17 Global Step: 218540 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:40,107-Speed 3231.94 samples/sec Loss 1.0933 LearningRate 0.0014 Epoch: 17 Global Step: 218550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:41:43,193-Speed 3319.02 samples/sec Loss 1.0202 LearningRate 0.0014 Epoch: 17 Global Step: 218560 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:46,330-Speed 3266.19 samples/sec Loss 1.0375 LearningRate 0.0014 Epoch: 17 Global Step: 218570 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:49,500-Speed 3231.42 samples/sec Loss 1.0670 LearningRate 0.0014 Epoch: 17 Global Step: 218580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:52,646-Speed 3255.89 samples/sec Loss 1.0444 LearningRate 0.0014 Epoch: 17 Global Step: 218590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:55,747-Speed 3303.41 samples/sec Loss 1.0576 LearningRate 0.0014 Epoch: 17 Global Step: 218600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:41:58,900-Speed 3248.74 samples/sec Loss 1.0771 LearningRate 0.0014 Epoch: 17 Global Step: 218610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:42:02,045-Speed 3257.49 samples/sec Loss 1.0617 LearningRate 0.0014 Epoch: 17 Global Step: 218620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:42:05,134-Speed 3316.04 samples/sec Loss 1.1047 LearningRate 0.0014 Epoch: 17 Global Step: 218630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:42:08,248-Speed 3288.95 samples/sec Loss 1.0722 LearningRate 0.0014 Epoch: 17 Global Step: 218640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:42:11,370-Speed 3281.32 samples/sec Loss 1.1026 LearningRate 0.0014 Epoch: 17 Global Step: 218650 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:42:14,498-Speed 3274.75 samples/sec Loss 1.0073 LearningRate 0.0014 Epoch: 17 Global Step: 218660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:42:17,632-Speed 3268.11 samples/sec Loss 1.0802 LearningRate 0.0014 Epoch: 17 Global Step: 218670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:42:20,726-Speed 3310.36 samples/sec Loss 1.0366 LearningRate 0.0014 Epoch: 17 Global Step: 218680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:42:23,820-Speed 3310.42 samples/sec Loss 1.0772 LearningRate 0.0014 Epoch: 17 Global Step: 218690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:42:26,970-Speed 3252.18 samples/sec Loss 1.0080 LearningRate 0.0014 Epoch: 17 Global Step: 218700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:42:30,095-Speed 3278.09 samples/sec Loss 1.0754 LearningRate 0.0014 Epoch: 17 Global Step: 218710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:42:33,154-Speed 3348.97 samples/sec Loss 1.0486 LearningRate 0.0014 Epoch: 17 Global Step: 218720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:42:36,251-Speed 3306.98 samples/sec Loss 1.0529 LearningRate 0.0014 Epoch: 17 Global Step: 218730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:42:39,353-Speed 3302.15 samples/sec Loss 1.0031 LearningRate 0.0014 Epoch: 17 Global Step: 218740 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:42:42,511-Speed 3243.70 samples/sec Loss 1.0642 LearningRate 0.0014 Epoch: 17 Global Step: 218750 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:42:45,577-Speed 3341.35 samples/sec Loss 1.0432 LearningRate 0.0014 Epoch: 17 Global Step: 218760 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:42:48,681-Speed 3299.34 samples/sec Loss 1.0311 LearningRate 0.0014 Epoch: 17 Global Step: 218770 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:42:51,770-Speed 3316.51 samples/sec Loss 1.0854 LearningRate 0.0014 Epoch: 17 Global Step: 218780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:42:54,831-Speed 3345.77 samples/sec Loss 1.0837 LearningRate 0.0014 Epoch: 17 Global Step: 218790 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:42:57,916-Speed 3320.74 samples/sec Loss 1.0429 LearningRate 0.0014 Epoch: 17 Global Step: 218800 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:01,043-Speed 3275.28 samples/sec Loss 1.0808 LearningRate 0.0014 Epoch: 17 Global Step: 218810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:04,190-Speed 3255.93 samples/sec Loss 1.1031 LearningRate 0.0014 Epoch: 17 Global Step: 218820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:07,314-Speed 3278.44 samples/sec Loss 1.0759 LearningRate 0.0014 Epoch: 17 Global Step: 218830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:10,442-Speed 3275.01 samples/sec Loss 1.0587 LearningRate 0.0014 Epoch: 17 Global Step: 218840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:43:13,571-Speed 3273.17 samples/sec Loss 1.0673 LearningRate 0.0014 Epoch: 17 Global Step: 218850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:43:16,758-Speed 3214.69 samples/sec Loss 1.0664 LearningRate 0.0014 Epoch: 17 Global Step: 218860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:43:19,907-Speed 3252.17 samples/sec Loss 1.0213 LearningRate 0.0014 Epoch: 17 Global Step: 218870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:43:22,977-Speed 3336.99 samples/sec Loss 1.0413 LearningRate 0.0014 Epoch: 17 Global Step: 218880 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:26,129-Speed 3250.09 samples/sec Loss 1.0697 LearningRate 0.0014 Epoch: 17 Global Step: 218890 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:29,203-Speed 3332.02 samples/sec Loss 1.0730 LearningRate 0.0014 Epoch: 17 Global Step: 218900 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:32,271-Speed 3338.72 samples/sec Loss 1.0797 LearningRate 0.0014 Epoch: 17 Global Step: 218910 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:35,376-Speed 3298.74 samples/sec Loss 1.1123 LearningRate 0.0014 Epoch: 17 Global Step: 218920 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:38,462-Speed 3319.80 samples/sec Loss 1.0734 LearningRate 0.0014 Epoch: 17 Global Step: 218930 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:41,576-Speed 3288.86 samples/sec Loss 1.0638 LearningRate 0.0014 Epoch: 17 Global Step: 218940 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:44,684-Speed 3295.61 samples/sec Loss 1.0300 LearningRate 0.0014 Epoch: 17 Global Step: 218950 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:47,809-Speed 3278.18 samples/sec Loss 1.0340 LearningRate 0.0014 Epoch: 17 Global Step: 218960 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:50,988-Speed 3221.92 samples/sec Loss 1.0627 LearningRate 0.0014 Epoch: 17 Global Step: 218970 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:43:54,135-Speed 3255.38 samples/sec Loss 1.0438 LearningRate 0.0014 Epoch: 17 Global Step: 218980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:43:57,209-Speed 3331.96 samples/sec Loss 1.0227 LearningRate 0.0014 Epoch: 17 Global Step: 218990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:00,363-Speed 3248.06 samples/sec Loss 1.0297 LearningRate 0.0014 Epoch: 17 Global Step: 219000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:03,536-Speed 3228.37 samples/sec Loss 1.0695 LearningRate 0.0014 Epoch: 17 Global Step: 219010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:06,598-Speed 3345.08 samples/sec Loss 1.1051 LearningRate 0.0014 Epoch: 17 Global Step: 219020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:09,649-Speed 3357.51 samples/sec Loss 1.0605 LearningRate 0.0014 Epoch: 17 Global Step: 219030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:12,777-Speed 3274.15 samples/sec Loss 1.0555 LearningRate 0.0014 Epoch: 17 Global Step: 219040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:15,922-Speed 3257.15 samples/sec Loss 1.0440 LearningRate 0.0014 Epoch: 17 Global Step: 219050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:19,077-Speed 3246.16 samples/sec Loss 1.0528 LearningRate 0.0014 Epoch: 17 Global Step: 219060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:22,145-Speed 3339.06 samples/sec Loss 1.0780 LearningRate 0.0014 Epoch: 17 Global Step: 219070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:25,267-Speed 3281.06 samples/sec Loss 1.0942 LearningRate 0.0014 Epoch: 17 Global Step: 219080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:44:28,361-Speed 3310.29 samples/sec Loss 1.0414 LearningRate 0.0014 Epoch: 17 Global Step: 219090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:44:31,451-Speed 3315.56 samples/sec Loss 1.0656 LearningRate 0.0014 Epoch: 17 Global Step: 219100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:44:34,537-Speed 3319.13 samples/sec Loss 1.0738 LearningRate 0.0014 Epoch: 17 Global Step: 219110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:44:37,672-Speed 3266.96 samples/sec Loss 1.0520 LearningRate 0.0014 Epoch: 17 Global Step: 219120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:44:40,758-Speed 3319.33 samples/sec Loss 1.0408 LearningRate 0.0014 Epoch: 17 Global Step: 219130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:44:43,825-Speed 3340.08 samples/sec Loss 1.0966 LearningRate 0.0014 Epoch: 17 Global Step: 219140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:44:46,918-Speed 3312.54 samples/sec Loss 1.0796 LearningRate 0.0014 Epoch: 17 Global Step: 219150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:50,020-Speed 3301.26 samples/sec Loss 1.0989 LearningRate 0.0014 Epoch: 17 Global Step: 219160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:53,169-Speed 3253.27 samples/sec Loss 1.0651 LearningRate 0.0014 Epoch: 17 Global Step: 219170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:56,268-Speed 3305.59 samples/sec Loss 1.0921 LearningRate 0.0014 Epoch: 17 Global Step: 219180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:44:59,378-Speed 3293.17 samples/sec Loss 1.0415 LearningRate 0.0014 Epoch: 17 Global Step: 219190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:45:02,535-Speed 3244.93 samples/sec Loss 1.0731 LearningRate 0.0014 Epoch: 17 Global Step: 219200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:45:05,634-Speed 3305.07 samples/sec Loss 1.0770 LearningRate 0.0014 Epoch: 17 Global Step: 219210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:45:08,695-Speed 3346.79 samples/sec Loss 1.0844 LearningRate 0.0014 Epoch: 17 Global Step: 219220 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:45:11,804-Speed 3293.93 samples/sec Loss 1.0419 LearningRate 0.0014 Epoch: 17 Global Step: 219230 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:45:14,946-Speed 3260.82 samples/sec Loss 1.0723 LearningRate 0.0014 Epoch: 17 Global Step: 219240 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:45:18,035-Speed 3316.23 samples/sec Loss 1.0895 LearningRate 0.0014 Epoch: 17 Global Step: 219250 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:45:21,099-Speed 3342.83 samples/sec Loss 1.0010 LearningRate 0.0014 Epoch: 17 Global Step: 219260 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:45:24,239-Speed 3261.46 samples/sec Loss 1.0478 LearningRate 0.0014 Epoch: 17 Global Step: 219270 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:45:27,414-Speed 3226.56 samples/sec Loss 1.0589 LearningRate 0.0014 Epoch: 17 Global Step: 219280 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:45:30,496-Speed 3323.65 samples/sec Loss 1.0440 LearningRate 0.0014 Epoch: 17 Global Step: 219290 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:45:33,584-Speed 3316.37 samples/sec Loss 1.0128 LearningRate 0.0014 Epoch: 17 Global Step: 219300 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:45:36,766-Speed 3220.30 samples/sec Loss 1.0773 LearningRate 0.0014 Epoch: 17 Global Step: 219310 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 20:45:39,836-Speed 3336.59 samples/sec Loss 1.0759 LearningRate 0.0014 Epoch: 17 Global Step: 219320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:45:42,930-Speed 3309.81 samples/sec Loss 1.0583 LearningRate 0.0014 Epoch: 17 Global Step: 219330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:45:46,020-Speed 3314.94 samples/sec Loss 1.0611 LearningRate 0.0014 Epoch: 17 Global Step: 219340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:45:49,204-Speed 3217.52 samples/sec Loss 1.0429 LearningRate 0.0014 Epoch: 17 Global Step: 219350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:45:52,395-Speed 3209.48 samples/sec Loss 1.1078 LearningRate 0.0014 Epoch: 17 Global Step: 219360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:45:55,508-Speed 3290.85 samples/sec Loss 1.0433 LearningRate 0.0014 Epoch: 17 Global Step: 219370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:45:58,552-Speed 3364.95 samples/sec Loss 1.0682 LearningRate 0.0014 Epoch: 17 Global Step: 219380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:01,607-Speed 3353.28 samples/sec Loss 1.0502 LearningRate 0.0014 Epoch: 17 Global Step: 219390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:04,791-Speed 3216.62 samples/sec Loss 1.0123 LearningRate 0.0014 Epoch: 17 Global Step: 219400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:07,909-Speed 3286.22 samples/sec Loss 1.0756 LearningRate 0.0014 Epoch: 17 Global Step: 219410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:10,956-Speed 3361.71 samples/sec Loss 1.0830 LearningRate 0.0014 Epoch: 17 Global Step: 219420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:14,021-Speed 3341.60 samples/sec Loss 1.0329 LearningRate 0.0014 Epoch: 17 Global Step: 219430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:17,148-Speed 3276.00 samples/sec Loss 1.0367 LearningRate 0.0014 Epoch: 17 Global Step: 219440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:20,242-Speed 3310.73 samples/sec Loss 1.0830 LearningRate 0.0014 Epoch: 17 Global Step: 219450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:23,369-Speed 3274.83 samples/sec Loss 1.0704 LearningRate 0.0014 Epoch: 17 Global Step: 219460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:26,457-Speed 3317.72 samples/sec Loss 1.0939 LearningRate 0.0014 Epoch: 17 Global Step: 219470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:29,579-Speed 3281.25 samples/sec Loss 1.0382 LearningRate 0.0014 Epoch: 17 Global Step: 219480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:32,728-Speed 3252.69 samples/sec Loss 1.0332 LearningRate 0.0014 Epoch: 17 Global Step: 219490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:35,905-Speed 3224.34 samples/sec Loss 1.0433 LearningRate 0.0014 Epoch: 17 Global Step: 219500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:38,997-Speed 3312.39 samples/sec Loss 1.0507 LearningRate 0.0014 Epoch: 17 Global Step: 219510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 20:46:42,117-Speed 3282.65 samples/sec Loss 1.0065 LearningRate 0.0014 Epoch: 17 Global Step: 219520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 20:46:45,196-Speed 3327.94 samples/sec Loss 1.0639 LearningRate 0.0014 Epoch: 17 Global Step: 219530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:46:48,326-Speed 3272.22 samples/sec Loss 1.0532 LearningRate 0.0014 Epoch: 17 Global Step: 219540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:46:51,464-Speed 3263.98 samples/sec Loss 1.0518 LearningRate 0.0014 Epoch: 17 Global Step: 219550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:46:54,623-Speed 3242.87 samples/sec Loss 1.0936 LearningRate 0.0013 Epoch: 17 Global Step: 219560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:46:57,703-Speed 3326.07 samples/sec Loss 1.0164 LearningRate 0.0013 Epoch: 17 Global Step: 219570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:00,781-Speed 3326.79 samples/sec Loss 1.0373 LearningRate 0.0013 Epoch: 17 Global Step: 219580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:03,868-Speed 3319.15 samples/sec Loss 1.0849 LearningRate 0.0013 Epoch: 17 Global Step: 219590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:06,977-Speed 3294.17 samples/sec Loss 1.0283 LearningRate 0.0013 Epoch: 17 Global Step: 219600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:10,038-Speed 3346.76 samples/sec Loss 1.0138 LearningRate 0.0013 Epoch: 17 Global Step: 219610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:13,174-Speed 3266.45 samples/sec Loss 1.0563 LearningRate 0.0013 Epoch: 17 Global Step: 219620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:16,299-Speed 3277.59 samples/sec Loss 1.0747 LearningRate 0.0013 Epoch: 17 Global Step: 219630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:19,445-Speed 3255.54 samples/sec Loss 1.0364 LearningRate 0.0013 Epoch: 17 Global Step: 219640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:22,558-Speed 3291.38 samples/sec Loss 1.0361 LearningRate 0.0013 Epoch: 17 Global Step: 219650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:47:25,682-Speed 3278.02 samples/sec Loss 1.0476 LearningRate 0.0013 Epoch: 17 Global Step: 219660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:28,814-Speed 3271.05 samples/sec Loss 1.0608 LearningRate 0.0013 Epoch: 17 Global Step: 219670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:31,929-Speed 3288.25 samples/sec Loss 1.0425 LearningRate 0.0013 Epoch: 17 Global Step: 219680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:35,008-Speed 3327.36 samples/sec Loss 1.0425 LearningRate 0.0013 Epoch: 17 Global Step: 219690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:38,219-Speed 3189.53 samples/sec Loss 1.0545 LearningRate 0.0013 Epoch: 17 Global Step: 219700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:41,312-Speed 3311.70 samples/sec Loss 0.9943 LearningRate 0.0013 Epoch: 17 Global Step: 219710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:44,422-Speed 3294.00 samples/sec Loss 1.1096 LearningRate 0.0013 Epoch: 17 Global Step: 219720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:47,544-Speed 3280.41 samples/sec Loss 1.0438 LearningRate 0.0013 Epoch: 17 Global Step: 219730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:47:50,609-Speed 3341.96 samples/sec Loss 1.0901 LearningRate 0.0013 Epoch: 17 Global Step: 219740 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:47:53,740-Speed 3271.26 samples/sec Loss 1.0566 LearningRate 0.0013 Epoch: 17 Global Step: 219750 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:47:56,796-Speed 3352.30 samples/sec Loss 1.0711 LearningRate 0.0013 Epoch: 17 Global Step: 219760 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:47:59,890-Speed 3310.67 samples/sec Loss 1.0441 LearningRate 0.0013 Epoch: 17 Global Step: 219770 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:48:02,984-Speed 3310.66 samples/sec Loss 1.0825 LearningRate 0.0013 Epoch: 17 Global Step: 219780 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:48:06,142-Speed 3243.55 samples/sec Loss 1.0693 LearningRate 0.0013 Epoch: 17 Global Step: 219790 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:48:09,219-Speed 3328.89 samples/sec Loss 1.0764 LearningRate 0.0013 Epoch: 17 Global Step: 219800 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:48:12,316-Speed 3307.67 samples/sec Loss 1.0358 LearningRate 0.0013 Epoch: 17 Global Step: 219810 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:48:15,471-Speed 3246.24 samples/sec Loss 1.0925 LearningRate 0.0013 Epoch: 17 Global Step: 219820 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:48:18,664-Speed 3208.01 samples/sec Loss 1.0726 LearningRate 0.0013 Epoch: 17 Global Step: 219830 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:48:21,740-Speed 3329.96 samples/sec Loss 1.0556 LearningRate 0.0013 Epoch: 17 Global Step: 219840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:48:24,944-Speed 3197.52 samples/sec Loss 1.0932 LearningRate 0.0013 Epoch: 17 Global Step: 219850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:48:28,064-Speed 3282.83 samples/sec Loss 1.0567 LearningRate 0.0013 Epoch: 17 Global Step: 219860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:48:31,198-Speed 3267.90 samples/sec Loss 1.1014 LearningRate 0.0013 Epoch: 17 Global Step: 219870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:48:34,277-Speed 3327.67 samples/sec Loss 1.0560 LearningRate 0.0013 Epoch: 17 Global Step: 219880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:48:37,395-Speed 3285.32 samples/sec Loss 1.0514 LearningRate 0.0013 Epoch: 17 Global Step: 219890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:48:40,552-Speed 3244.09 samples/sec Loss 1.0821 LearningRate 0.0013 Epoch: 17 Global Step: 219900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:48:43,667-Speed 3288.67 samples/sec Loss 1.0803 LearningRate 0.0013 Epoch: 17 Global Step: 219910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:48:46,739-Speed 3335.06 samples/sec Loss 1.0568 LearningRate 0.0013 Epoch: 17 Global Step: 219920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:48:49,865-Speed 3276.38 samples/sec Loss 1.0895 LearningRate 0.0013 Epoch: 17 Global Step: 219930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:48:53,037-Speed 3229.15 samples/sec Loss 1.1182 LearningRate 0.0013 Epoch: 17 Global Step: 219940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:48:56,130-Speed 3311.42 samples/sec Loss 1.0484 LearningRate 0.0013 Epoch: 17 Global Step: 219950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:48:59,350-Speed 3181.33 samples/sec Loss 1.0192 LearningRate 0.0013 Epoch: 17 Global Step: 219960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:49:02,471-Speed 3281.85 samples/sec Loss 1.0245 LearningRate 0.0013 Epoch: 17 Global Step: 219970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:49:05,603-Speed 3270.19 samples/sec Loss 1.0422 LearningRate 0.0013 Epoch: 17 Global Step: 219980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:49:08,713-Speed 3294.27 samples/sec Loss 1.0597 LearningRate 0.0013 Epoch: 17 Global Step: 219990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:49:11,845-Speed 3270.15 samples/sec Loss 1.0698 LearningRate 0.0013 Epoch: 17 Global Step: 220000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:49:14,995-Speed 3252.28 samples/sec Loss 1.0288 LearningRate 0.0013 Epoch: 17 Global Step: 220010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:49:18,179-Speed 3217.08 samples/sec Loss 1.0697 LearningRate 0.0013 Epoch: 17 Global Step: 220020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:49:21,272-Speed 3311.11 samples/sec Loss 1.0575 LearningRate 0.0013 Epoch: 17 Global Step: 220030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:49:24,401-Speed 3273.89 samples/sec Loss 1.1025 LearningRate 0.0013 Epoch: 17 Global Step: 220040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:49:27,559-Speed 3243.71 samples/sec Loss 1.0386 LearningRate 0.0013 Epoch: 17 Global Step: 220050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:49:30,642-Speed 3322.76 samples/sec Loss 1.0044 LearningRate 0.0013 Epoch: 17 Global Step: 220060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:49:33,801-Speed 3242.27 samples/sec Loss 0.9969 LearningRate 0.0013 Epoch: 17 Global Step: 220070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:49:36,902-Speed 3303.51 samples/sec Loss 1.0516 LearningRate 0.0013 Epoch: 17 Global Step: 220080 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:49:39,983-Speed 3324.40 samples/sec Loss 1.0481 LearningRate 0.0013 Epoch: 17 Global Step: 220090 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:49:43,081-Speed 3305.97 samples/sec Loss 1.0293 LearningRate 0.0013 Epoch: 17 Global Step: 220100 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:49:46,183-Speed 3302.59 samples/sec Loss 1.0754 LearningRate 0.0013 Epoch: 17 Global Step: 220110 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:49:49,357-Speed 3227.18 samples/sec Loss 1.0640 LearningRate 0.0013 Epoch: 17 Global Step: 220120 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:49:52,477-Speed 3283.56 samples/sec Loss 1.0342 LearningRate 0.0013 Epoch: 17 Global Step: 220130 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:49:55,572-Speed 3309.31 samples/sec Loss 1.0654 LearningRate 0.0013 Epoch: 17 Global Step: 220140 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:49:58,655-Speed 3322.73 samples/sec Loss 0.9977 LearningRate 0.0013 Epoch: 17 Global Step: 220150 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:50:01,788-Speed 3269.28 samples/sec Loss 1.0563 LearningRate 0.0013 Epoch: 17 Global Step: 220160 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:50:04,907-Speed 3284.49 samples/sec Loss 1.0733 LearningRate 0.0013 Epoch: 17 Global Step: 220170 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:50:08,030-Speed 3279.24 samples/sec Loss 1.0271 LearningRate 0.0013 Epoch: 17 Global Step: 220180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:50:11,141-Speed 3292.19 samples/sec Loss 1.0232 LearningRate 0.0013 Epoch: 17 Global Step: 220190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:50:14,265-Speed 3279.33 samples/sec Loss 1.0846 LearningRate 0.0013 Epoch: 17 Global Step: 220200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:50:17,355-Speed 3314.98 samples/sec Loss 1.0626 LearningRate 0.0013 Epoch: 17 Global Step: 220210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:50:20,474-Speed 3284.12 samples/sec Loss 1.0414 LearningRate 0.0013 Epoch: 17 Global Step: 220220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:50:23,575-Speed 3303.08 samples/sec Loss 1.0406 LearningRate 0.0013 Epoch: 17 Global Step: 220230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:50:26,691-Speed 3287.31 samples/sec Loss 1.0402 LearningRate 0.0013 Epoch: 17 Global Step: 220240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:50:29,880-Speed 3212.11 samples/sec Loss 1.0643 LearningRate 0.0013 Epoch: 17 Global Step: 220250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:50:32,979-Speed 3304.74 samples/sec Loss 1.0730 LearningRate 0.0013 Epoch: 17 Global Step: 220260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:50:36,064-Speed 3320.83 samples/sec Loss 1.0389 LearningRate 0.0013 Epoch: 17 Global Step: 220270 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:50:39,185-Speed 3282.76 samples/sec Loss 1.0281 LearningRate 0.0013 Epoch: 17 Global Step: 220280 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:50:42,381-Speed 3204.74 samples/sec Loss 1.0990 LearningRate 0.0013 Epoch: 17 Global Step: 220290 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:50:45,467-Speed 3318.62 samples/sec Loss 1.0642 LearningRate 0.0013 Epoch: 17 Global Step: 220300 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:50:48,591-Speed 3281.10 samples/sec Loss 1.0389 LearningRate 0.0013 Epoch: 17 Global Step: 220310 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:50:51,692-Speed 3303.45 samples/sec Loss 1.0426 LearningRate 0.0013 Epoch: 17 Global Step: 220320 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:50:54,809-Speed 3285.51 samples/sec Loss 1.0489 LearningRate 0.0013 Epoch: 17 Global Step: 220330 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:50:57,898-Speed 3316.80 samples/sec Loss 1.0657 LearningRate 0.0013 Epoch: 17 Global Step: 220340 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:01,014-Speed 3287.40 samples/sec Loss 1.0838 LearningRate 0.0013 Epoch: 17 Global Step: 220350 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:04,142-Speed 3274.31 samples/sec Loss 1.0917 LearningRate 0.0013 Epoch: 17 Global Step: 220360 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:07,223-Speed 3325.91 samples/sec Loss 1.0292 LearningRate 0.0013 Epoch: 17 Global Step: 220370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:51:10,276-Speed 3354.66 samples/sec Loss 1.0321 LearningRate 0.0013 Epoch: 17 Global Step: 220380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:51:13,465-Speed 3211.73 samples/sec Loss 1.0518 LearningRate 0.0013 Epoch: 17 Global Step: 220390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:51:16,644-Speed 3222.30 samples/sec Loss 1.0673 LearningRate 0.0013 Epoch: 17 Global Step: 220400 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:19,719-Speed 3331.51 samples/sec Loss 1.0722 LearningRate 0.0013 Epoch: 17 Global Step: 220410 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:22,793-Speed 3332.80 samples/sec Loss 1.0390 LearningRate 0.0013 Epoch: 17 Global Step: 220420 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:25,859-Speed 3340.41 samples/sec Loss 1.0608 LearningRate 0.0013 Epoch: 17 Global Step: 220430 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:28,982-Speed 3280.22 samples/sec Loss 1.0645 LearningRate 0.0013 Epoch: 17 Global Step: 220440 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:32,091-Speed 3294.59 samples/sec Loss 1.0427 LearningRate 0.0013 Epoch: 17 Global Step: 220450 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:35,155-Speed 3342.61 samples/sec Loss 1.0400 LearningRate 0.0013 Epoch: 17 Global Step: 220460 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:38,295-Speed 3262.58 samples/sec Loss 1.0609 LearningRate 0.0013 Epoch: 17 Global Step: 220470 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:41,424-Speed 3273.10 samples/sec Loss 1.0285 LearningRate 0.0013 Epoch: 17 Global Step: 220480 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:44,517-Speed 3311.62 samples/sec Loss 1.0022 LearningRate 0.0013 Epoch: 17 Global Step: 220490 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:51:47,605-Speed 3317.85 samples/sec Loss 1.0512 LearningRate 0.0013 Epoch: 17 Global Step: 220500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:51:50,703-Speed 3306.42 samples/sec Loss 1.0364 LearningRate 0.0013 Epoch: 17 Global Step: 220510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:51:53,780-Speed 3328.39 samples/sec Loss 1.0455 LearningRate 0.0013 Epoch: 17 Global Step: 220520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:51:56,872-Speed 3313.08 samples/sec Loss 1.0102 LearningRate 0.0013 Epoch: 17 Global Step: 220530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:51:59,944-Speed 3333.96 samples/sec Loss 1.0464 LearningRate 0.0013 Epoch: 17 Global Step: 220540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:03,154-Speed 3191.23 samples/sec Loss 1.0521 LearningRate 0.0013 Epoch: 17 Global Step: 220550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:06,282-Speed 3274.95 samples/sec Loss 1.0489 LearningRate 0.0013 Epoch: 17 Global Step: 220560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:09,351-Speed 3337.50 samples/sec Loss 1.1009 LearningRate 0.0013 Epoch: 17 Global Step: 220570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:12,400-Speed 3359.99 samples/sec Loss 1.0423 LearningRate 0.0013 Epoch: 17 Global Step: 220580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:15,547-Speed 3254.03 samples/sec Loss 1.0494 LearningRate 0.0013 Epoch: 17 Global Step: 220590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:18,728-Speed 3220.84 samples/sec Loss 1.0780 LearningRate 0.0013 Epoch: 17 Global Step: 220600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:52:21,807-Speed 3326.39 samples/sec Loss 1.0599 LearningRate 0.0013 Epoch: 17 Global Step: 220610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:52:24,870-Speed 3344.85 samples/sec Loss 1.0349 LearningRate 0.0013 Epoch: 17 Global Step: 220620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:27,958-Speed 3317.34 samples/sec Loss 1.0177 LearningRate 0.0013 Epoch: 17 Global Step: 220630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:31,040-Speed 3323.01 samples/sec Loss 1.0365 LearningRate 0.0013 Epoch: 17 Global Step: 220640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:34,122-Speed 3323.86 samples/sec Loss 1.0528 LearningRate 0.0012 Epoch: 17 Global Step: 220650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:37,275-Speed 3249.18 samples/sec Loss 1.0238 LearningRate 0.0012 Epoch: 17 Global Step: 220660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:40,469-Speed 3206.51 samples/sec Loss 1.0211 LearningRate 0.0012 Epoch: 17 Global Step: 220670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:43,561-Speed 3313.10 samples/sec Loss 1.0745 LearningRate 0.0012 Epoch: 17 Global Step: 220680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:46,646-Speed 3320.59 samples/sec Loss 1.0525 LearningRate 0.0012 Epoch: 17 Global Step: 220690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:49,818-Speed 3228.35 samples/sec Loss 1.0365 LearningRate 0.0012 Epoch: 17 Global Step: 220700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:52,953-Speed 3268.23 samples/sec Loss 1.0554 LearningRate 0.0012 Epoch: 17 Global Step: 220710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:56,056-Speed 3300.18 samples/sec Loss 1.0685 LearningRate 0.0012 Epoch: 17 Global Step: 220720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:52:59,179-Speed 3280.51 samples/sec Loss 1.0731 LearningRate 0.0012 Epoch: 17 Global Step: 220730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:53:02,313-Speed 3268.32 samples/sec Loss 1.0346 LearningRate 0.0012 Epoch: 17 Global Step: 220740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:53:05,526-Speed 3188.52 samples/sec Loss 1.1002 LearningRate 0.0012 Epoch: 17 Global Step: 220750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:53:08,600-Speed 3331.58 samples/sec Loss 1.0077 LearningRate 0.0012 Epoch: 17 Global Step: 220760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:53:11,694-Speed 3311.06 samples/sec Loss 1.0450 LearningRate 0.0012 Epoch: 17 Global Step: 220770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:53:14,804-Speed 3294.02 samples/sec Loss 1.0585 LearningRate 0.0012 Epoch: 17 Global Step: 220780 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:53:17,954-Speed 3252.11 samples/sec Loss 1.1032 LearningRate 0.0012 Epoch: 17 Global Step: 220790 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:53:21,059-Speed 3298.39 samples/sec Loss 1.0457 LearningRate 0.0012 Epoch: 17 Global Step: 220800 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:53:24,143-Speed 3321.43 samples/sec Loss 1.0447 LearningRate 0.0012 Epoch: 17 Global Step: 220810 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:53:27,302-Speed 3243.17 samples/sec Loss 1.0330 LearningRate 0.0012 Epoch: 17 Global Step: 220820 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:53:30,480-Speed 3223.08 samples/sec Loss 1.0257 LearningRate 0.0012 Epoch: 17 Global Step: 220830 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:53:33,592-Speed 3293.79 samples/sec Loss 1.0338 LearningRate 0.0012 Epoch: 17 Global Step: 220840 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:53:36,762-Speed 3230.95 samples/sec Loss 1.0619 LearningRate 0.0012 Epoch: 17 Global Step: 220850 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:53:39,865-Speed 3301.20 samples/sec Loss 1.0758 LearningRate 0.0012 Epoch: 17 Global Step: 220860 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:53:42,998-Speed 3268.93 samples/sec Loss 1.0331 LearningRate 0.0012 Epoch: 17 Global Step: 220870 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:53:46,064-Speed 3341.40 samples/sec Loss 1.0623 LearningRate 0.0012 Epoch: 17 Global Step: 220880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:53:49,284-Speed 3181.32 samples/sec Loss 1.0535 LearningRate 0.0012 Epoch: 17 Global Step: 220890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:53:52,431-Speed 3254.54 samples/sec Loss 1.1170 LearningRate 0.0012 Epoch: 17 Global Step: 220900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:53:55,616-Speed 3216.43 samples/sec Loss 1.0508 LearningRate 0.0012 Epoch: 17 Global Step: 220910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:53:58,709-Speed 3311.17 samples/sec Loss 0.9729 LearningRate 0.0012 Epoch: 17 Global Step: 220920 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:54:01,831-Speed 3280.90 samples/sec Loss 1.0474 LearningRate 0.0012 Epoch: 17 Global Step: 220930 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:54:04,971-Speed 3263.08 samples/sec Loss 1.0488 LearningRate 0.0012 Epoch: 17 Global Step: 220940 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:54:08,104-Speed 3269.54 samples/sec Loss 1.1049 LearningRate 0.0012 Epoch: 17 Global Step: 220950 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:54:11,227-Speed 3280.00 samples/sec Loss 1.0989 LearningRate 0.0012 Epoch: 17 Global Step: 220960 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:54:14,390-Speed 3238.23 samples/sec Loss 1.0104 LearningRate 0.0012 Epoch: 17 Global Step: 220970 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:54:17,530-Speed 3262.51 samples/sec Loss 1.0560 LearningRate 0.0012 Epoch: 17 Global Step: 220980 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:54:20,616-Speed 3318.89 samples/sec Loss 1.0475 LearningRate 0.0012 Epoch: 17 Global Step: 220990 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:54:23,724-Speed 3296.00 samples/sec Loss 1.0316 LearningRate 0.0012 Epoch: 17 Global Step: 221000 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:54:26,848-Speed 3278.51 samples/sec Loss 1.0717 LearningRate 0.0012 Epoch: 17 Global Step: 221010 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:54:29,999-Speed 3251.36 samples/sec Loss 1.0854 LearningRate 0.0012 Epoch: 17 Global Step: 221020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:54:33,102-Speed 3300.62 samples/sec Loss 1.1071 LearningRate 0.0012 Epoch: 17 Global Step: 221030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:54:36,207-Speed 3299.78 samples/sec Loss 1.0104 LearningRate 0.0012 Epoch: 17 Global Step: 221040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:54:39,351-Speed 3258.04 samples/sec Loss 1.0430 LearningRate 0.0012 Epoch: 17 Global Step: 221050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:54:42,507-Speed 3244.80 samples/sec Loss 1.0427 LearningRate 0.0012 Epoch: 17 Global Step: 221060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:54:45,609-Speed 3303.09 samples/sec Loss 1.0370 LearningRate 0.0012 Epoch: 17 Global Step: 221070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:54:48,725-Speed 3287.23 samples/sec Loss 1.0659 LearningRate 0.0012 Epoch: 17 Global Step: 221080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:54:51,867-Speed 3260.14 samples/sec Loss 1.0644 LearningRate 0.0012 Epoch: 17 Global Step: 221090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:54:55,018-Speed 3251.00 samples/sec Loss 1.0305 LearningRate 0.0012 Epoch: 17 Global Step: 221100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:54:58,073-Speed 3352.28 samples/sec Loss 1.0277 LearningRate 0.0012 Epoch: 17 Global Step: 221110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:55:01,212-Speed 3263.27 samples/sec Loss 1.0578 LearningRate 0.0012 Epoch: 17 Global Step: 221120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 20:55:04,289-Speed 3328.89 samples/sec Loss 1.0323 LearningRate 0.0012 Epoch: 17 Global Step: 221130 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:55:07,472-Speed 3217.89 samples/sec Loss 1.0522 LearningRate 0.0012 Epoch: 17 Global Step: 221140 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:55:10,613-Speed 3261.88 samples/sec Loss 1.0211 LearningRate 0.0012 Epoch: 17 Global Step: 221150 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:55:13,760-Speed 3254.42 samples/sec Loss 1.0529 LearningRate 0.0012 Epoch: 17 Global Step: 221160 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:55:16,905-Speed 3257.63 samples/sec Loss 1.0408 LearningRate 0.0012 Epoch: 17 Global Step: 221170 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:55:20,052-Speed 3254.43 samples/sec Loss 1.0609 LearningRate 0.0012 Epoch: 17 Global Step: 221180 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:55:23,184-Speed 3270.96 samples/sec Loss 1.0086 LearningRate 0.0012 Epoch: 17 Global Step: 221190 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:55:26,294-Speed 3294.03 samples/sec Loss 1.0917 LearningRate 0.0012 Epoch: 17 Global Step: 221200 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:55:29,395-Speed 3303.21 samples/sec Loss 1.0952 LearningRate 0.0012 Epoch: 17 Global Step: 221210 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:55:32,478-Speed 3322.62 samples/sec Loss 1.0348 LearningRate 0.0012 Epoch: 17 Global Step: 221220 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:55:35,578-Speed 3303.68 samples/sec Loss 1.0801 LearningRate 0.0012 Epoch: 17 Global Step: 221230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:55:38,682-Speed 3299.77 samples/sec Loss 1.0446 LearningRate 0.0012 Epoch: 17 Global Step: 221240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:55:41,870-Speed 3212.93 samples/sec Loss 1.0312 LearningRate 0.0012 Epoch: 17 Global Step: 221250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:55:44,949-Speed 3327.50 samples/sec Loss 1.0236 LearningRate 0.0012 Epoch: 17 Global Step: 221260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:55:48,062-Speed 3290.87 samples/sec Loss 1.0160 LearningRate 0.0012 Epoch: 17 Global Step: 221270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:55:51,164-Speed 3301.56 samples/sec Loss 1.0629 LearningRate 0.0012 Epoch: 17 Global Step: 221280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:55:54,297-Speed 3270.14 samples/sec Loss 1.0309 LearningRate 0.0012 Epoch: 17 Global Step: 221290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:55:57,348-Speed 3357.53 samples/sec Loss 1.1011 LearningRate 0.0012 Epoch: 17 Global Step: 221300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:56:00,450-Speed 3301.26 samples/sec Loss 1.0794 LearningRate 0.0012 Epoch: 17 Global Step: 221310 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:56:03,663-Speed 3188.01 samples/sec Loss 1.0366 LearningRate 0.0012 Epoch: 17 Global Step: 221320 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:56:06,814-Speed 3250.59 samples/sec Loss 1.0634 LearningRate 0.0012 Epoch: 17 Global Step: 221330 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:56:09,886-Speed 3335.29 samples/sec Loss 1.0585 LearningRate 0.0012 Epoch: 17 Global Step: 221340 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:56:13,142-Speed 3145.66 samples/sec Loss 1.0148 LearningRate 0.0012 Epoch: 17 Global Step: 221350 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:56:16,323-Speed 3219.92 samples/sec Loss 1.0700 LearningRate 0.0012 Epoch: 17 Global Step: 221360 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:56:19,500-Speed 3224.65 samples/sec Loss 1.0993 LearningRate 0.0012 Epoch: 17 Global Step: 221370 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:56:22,568-Speed 3338.40 samples/sec Loss 1.0552 LearningRate 0.0012 Epoch: 17 Global Step: 221380 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:56:25,749-Speed 3219.65 samples/sec Loss 1.0502 LearningRate 0.0012 Epoch: 17 Global Step: 221390 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:56:28,883-Speed 3269.43 samples/sec Loss 1.1052 LearningRate 0.0012 Epoch: 17 Global Step: 221400 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:56:31,969-Speed 3319.14 samples/sec Loss 1.0678 LearningRate 0.0012 Epoch: 17 Global Step: 221410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:56:35,103-Speed 3267.92 samples/sec Loss 1.0303 LearningRate 0.0012 Epoch: 17 Global Step: 221420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:56:38,270-Speed 3234.74 samples/sec Loss 1.0205 LearningRate 0.0012 Epoch: 17 Global Step: 221430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:56:41,386-Speed 3286.87 samples/sec Loss 1.0142 LearningRate 0.0012 Epoch: 17 Global Step: 221440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:56:44,519-Speed 3269.95 samples/sec Loss 1.0634 LearningRate 0.0012 Epoch: 17 Global Step: 221450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:56:47,626-Speed 3297.33 samples/sec Loss 1.1028 LearningRate 0.0012 Epoch: 17 Global Step: 221460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:56:50,795-Speed 3232.09 samples/sec Loss 1.0915 LearningRate 0.0012 Epoch: 17 Global Step: 221470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:56:53,945-Speed 3251.92 samples/sec Loss 1.0478 LearningRate 0.0012 Epoch: 17 Global Step: 221480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:56:57,022-Speed 3329.00 samples/sec Loss 1.0383 LearningRate 0.0012 Epoch: 17 Global Step: 221490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:57:00,078-Speed 3351.64 samples/sec Loss 1.0478 LearningRate 0.0012 Epoch: 17 Global Step: 221500 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:03,162-Speed 3322.10 samples/sec Loss 1.0238 LearningRate 0.0012 Epoch: 17 Global Step: 221510 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:06,293-Speed 3271.40 samples/sec Loss 1.0498 LearningRate 0.0012 Epoch: 17 Global Step: 221520 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:09,391-Speed 3307.06 samples/sec Loss 1.0546 LearningRate 0.0012 Epoch: 17 Global Step: 221530 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:12,566-Speed 3225.71 samples/sec Loss 1.0519 LearningRate 0.0012 Epoch: 17 Global Step: 221540 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:15,684-Speed 3285.11 samples/sec Loss 1.0838 LearningRate 0.0012 Epoch: 17 Global Step: 221550 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:18,781-Speed 3307.48 samples/sec Loss 1.0581 LearningRate 0.0012 Epoch: 17 Global Step: 221560 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:21,888-Speed 3297.34 samples/sec Loss 1.0327 LearningRate 0.0012 Epoch: 17 Global Step: 221570 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:25,008-Speed 3282.34 samples/sec Loss 1.0402 LearningRate 0.0012 Epoch: 17 Global Step: 221580 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:28,098-Speed 3315.15 samples/sec Loss 1.0416 LearningRate 0.0012 Epoch: 17 Global Step: 221590 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:31,212-Speed 3289.61 samples/sec Loss 1.0420 LearningRate 0.0012 Epoch: 17 Global Step: 221600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:57:34,279-Speed 3339.07 samples/sec Loss 1.0608 LearningRate 0.0012 Epoch: 17 Global Step: 221610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:57:37,364-Speed 3320.51 samples/sec Loss 0.9914 LearningRate 0.0012 Epoch: 17 Global Step: 221620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:57:40,538-Speed 3227.41 samples/sec Loss 1.0300 LearningRate 0.0012 Epoch: 17 Global Step: 221630 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:43,632-Speed 3311.18 samples/sec Loss 1.0047 LearningRate 0.0012 Epoch: 17 Global Step: 221640 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:46,729-Speed 3307.25 samples/sec Loss 1.0181 LearningRate 0.0012 Epoch: 17 Global Step: 221650 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:49,872-Speed 3259.08 samples/sec Loss 1.0089 LearningRate 0.0012 Epoch: 17 Global Step: 221660 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:53,035-Speed 3238.23 samples/sec Loss 1.0544 LearningRate 0.0012 Epoch: 17 Global Step: 221670 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:56,136-Speed 3303.59 samples/sec Loss 1.0505 LearningRate 0.0012 Epoch: 17 Global Step: 221680 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:57:59,250-Speed 3289.32 samples/sec Loss 1.0577 LearningRate 0.0012 Epoch: 17 Global Step: 221690 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:58:02,417-Speed 3234.81 samples/sec Loss 1.0927 LearningRate 0.0012 Epoch: 17 Global Step: 221700 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:58:05,537-Speed 3282.61 samples/sec Loss 1.0489 LearningRate 0.0012 Epoch: 17 Global Step: 221710 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:58:08,642-Speed 3298.69 samples/sec Loss 1.0436 LearningRate 0.0012 Epoch: 17 Global Step: 221720 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:58:11,747-Speed 3298.95 samples/sec Loss 1.0296 LearningRate 0.0012 Epoch: 17 Global Step: 221730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:58:14,878-Speed 3272.09 samples/sec Loss 1.0689 LearningRate 0.0012 Epoch: 17 Global Step: 221740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:58:17,959-Speed 3324.13 samples/sec Loss 1.0392 LearningRate 0.0012 Epoch: 17 Global Step: 221750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:58:21,033-Speed 3332.97 samples/sec Loss 1.0198 LearningRate 0.0012 Epoch: 17 Global Step: 221760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:58:24,098-Speed 3341.42 samples/sec Loss 1.0751 LearningRate 0.0012 Epoch: 17 Global Step: 221770 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:58:27,166-Speed 3338.84 samples/sec Loss 1.0545 LearningRate 0.0011 Epoch: 17 Global Step: 221780 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:58:30,248-Speed 3323.65 samples/sec Loss 1.0893 LearningRate 0.0011 Epoch: 17 Global Step: 221790 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:58:33,312-Speed 3343.21 samples/sec Loss 0.9900 LearningRate 0.0011 Epoch: 17 Global Step: 221800 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:58:36,379-Speed 3339.84 samples/sec Loss 1.0518 LearningRate 0.0011 Epoch: 17 Global Step: 221810 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:58:39,467-Speed 3316.61 samples/sec Loss 1.0238 LearningRate 0.0011 Epoch: 17 Global Step: 221820 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 20:58:42,542-Speed 3331.50 samples/sec Loss 1.0916 LearningRate 0.0011 Epoch: 17 Global Step: 221830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 20:58:45,606-Speed 3343.49 samples/sec Loss 1.0652 LearningRate 0.0011 Epoch: 17 Global Step: 221840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 20:58:48,659-Speed 3354.75 samples/sec Loss 1.0115 LearningRate 0.0011 Epoch: 17 Global Step: 221850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 20:58:51,731-Speed 3334.47 samples/sec Loss 1.0124 LearningRate 0.0011 Epoch: 17 Global Step: 221860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 20:58:54,815-Speed 3321.02 samples/sec Loss 1.0178 LearningRate 0.0011 Epoch: 17 Global Step: 221870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 20:58:57,880-Speed 3357.90 samples/sec Loss 1.0170 LearningRate 0.0011 Epoch: 17 Global Step: 221880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 20:59:00,984-Speed 3300.15 samples/sec Loss 1.0307 LearningRate 0.0011 Epoch: 17 Global Step: 221890 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 20:59:04,154-Speed 3230.82 samples/sec Loss 1.0324 LearningRate 0.0011 Epoch: 17 Global Step: 221900 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 20:59:07,270-Speed 3286.86 samples/sec Loss 1.0608 LearningRate 0.0011 Epoch: 17 Global Step: 221910 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 20:59:10,351-Speed 3325.07 samples/sec Loss 1.0305 LearningRate 0.0011 Epoch: 17 Global Step: 221920 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:59:13,486-Speed 3267.26 samples/sec Loss 1.0173 LearningRate 0.0011 Epoch: 17 Global Step: 221930 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:59:16,641-Speed 3246.53 samples/sec Loss 1.0133 LearningRate 0.0011 Epoch: 17 Global Step: 221940 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:59:19,766-Speed 3277.88 samples/sec Loss 1.0304 LearningRate 0.0011 Epoch: 17 Global Step: 221950 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:59:22,868-Speed 3301.61 samples/sec Loss 1.0796 LearningRate 0.0011 Epoch: 17 Global Step: 221960 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:59:26,043-Speed 3227.10 samples/sec Loss 1.0697 LearningRate 0.0011 Epoch: 17 Global Step: 221970 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:59:29,170-Speed 3275.72 samples/sec Loss 1.0534 LearningRate 0.0011 Epoch: 17 Global Step: 221980 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:59:32,251-Speed 3324.59 samples/sec Loss 1.0466 LearningRate 0.0011 Epoch: 17 Global Step: 221990 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:59:35,318-Speed 3339.98 samples/sec Loss 1.0829 LearningRate 0.0011 Epoch: 17 Global Step: 222000 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:59:38,456-Speed 3263.81 samples/sec Loss 1.0488 LearningRate 0.0011 Epoch: 17 Global Step: 222010 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 20:59:41,544-Speed 3316.77 samples/sec Loss 1.0499 LearningRate 0.0011 Epoch: 17 Global Step: 222020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:59:44,626-Speed 3323.48 samples/sec Loss 1.0756 LearningRate 0.0011 Epoch: 17 Global Step: 222030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:59:47,704-Speed 3329.02 samples/sec Loss 1.0641 LearningRate 0.0011 Epoch: 17 Global Step: 222040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:59:50,782-Speed 3327.55 samples/sec Loss 1.0426 LearningRate 0.0011 Epoch: 17 Global Step: 222050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:59:53,999-Speed 3183.81 samples/sec Loss 1.0530 LearningRate 0.0011 Epoch: 17 Global Step: 222060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 20:59:57,091-Speed 3313.07 samples/sec Loss 1.0238 LearningRate 0.0011 Epoch: 17 Global Step: 222070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:00:00,178-Speed 3318.73 samples/sec Loss 1.0334 LearningRate 0.0011 Epoch: 17 Global Step: 222080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:00:03,355-Speed 3224.02 samples/sec Loss 1.0682 LearningRate 0.0011 Epoch: 17 Global Step: 222090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:00:06,433-Speed 3327.52 samples/sec Loss 1.0276 LearningRate 0.0011 Epoch: 17 Global Step: 222100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:00:09,503-Speed 3336.94 samples/sec Loss 1.0405 LearningRate 0.0011 Epoch: 17 Global Step: 222110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:00:12,600-Speed 3307.08 samples/sec Loss 1.0318 LearningRate 0.0011 Epoch: 17 Global Step: 222120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:00:15,726-Speed 3277.54 samples/sec Loss 1.0556 LearningRate 0.0011 Epoch: 17 Global Step: 222130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:00:18,849-Speed 3279.78 samples/sec Loss 1.0970 LearningRate 0.0011 Epoch: 17 Global Step: 222140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:00:21,911-Speed 3345.29 samples/sec Loss 1.0442 LearningRate 0.0011 Epoch: 17 Global Step: 222150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:00:25,061-Speed 3251.15 samples/sec Loss 1.0697 LearningRate 0.0011 Epoch: 17 Global Step: 222160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:00:28,160-Speed 3305.78 samples/sec Loss 1.0510 LearningRate 0.0011 Epoch: 17 Global Step: 222170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:00:31,300-Speed 3261.62 samples/sec Loss 1.0321 LearningRate 0.0011 Epoch: 17 Global Step: 222180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:00:34,321-Speed 3391.39 samples/sec Loss 1.0337 LearningRate 0.0011 Epoch: 17 Global Step: 222190 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:00:37,394-Speed 3333.68 samples/sec Loss 1.0216 LearningRate 0.0011 Epoch: 17 Global Step: 222200 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:00:40,473-Speed 3326.52 samples/sec Loss 1.0267 LearningRate 0.0011 Epoch: 17 Global Step: 222210 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:00:43,564-Speed 3313.43 samples/sec Loss 1.0353 LearningRate 0.0011 Epoch: 17 Global Step: 222220 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:00:46,657-Speed 3312.51 samples/sec Loss 1.0343 LearningRate 0.0011 Epoch: 17 Global Step: 222230 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:00:49,736-Speed 3326.12 samples/sec Loss 1.0741 LearningRate 0.0011 Epoch: 17 Global Step: 222240 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:00:52,796-Speed 3347.61 samples/sec Loss 1.0071 LearningRate 0.0011 Epoch: 17 Global Step: 222250 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:00:55,874-Speed 3328.29 samples/sec Loss 1.0236 LearningRate 0.0011 Epoch: 17 Global Step: 222260 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:00:58,938-Speed 3342.97 samples/sec Loss 1.0637 LearningRate 0.0011 Epoch: 17 Global Step: 222270 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:01:02,008-Speed 3336.75 samples/sec Loss 1.0360 LearningRate 0.0011 Epoch: 17 Global Step: 222280 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:01:05,070-Speed 3344.68 samples/sec Loss 1.0394 LearningRate 0.0011 Epoch: 17 Global Step: 222290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:08,132-Speed 3345.95 samples/sec Loss 1.0347 LearningRate 0.0011 Epoch: 17 Global Step: 222300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:11,220-Speed 3316.77 samples/sec Loss 1.0461 LearningRate 0.0011 Epoch: 17 Global Step: 222310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:14,339-Speed 3284.07 samples/sec Loss 1.0644 LearningRate 0.0011 Epoch: 17 Global Step: 222320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:17,449-Speed 3293.88 samples/sec Loss 1.0647 LearningRate 0.0011 Epoch: 17 Global Step: 222330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:20,558-Speed 3295.23 samples/sec Loss 1.0661 LearningRate 0.0011 Epoch: 17 Global Step: 222340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:23,654-Speed 3307.87 samples/sec Loss 1.0097 LearningRate 0.0011 Epoch: 17 Global Step: 222350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:26,789-Speed 3267.54 samples/sec Loss 1.0585 LearningRate 0.0011 Epoch: 17 Global Step: 222360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:29,905-Speed 3287.75 samples/sec Loss 1.0331 LearningRate 0.0011 Epoch: 17 Global Step: 222370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:32,988-Speed 3321.84 samples/sec Loss 1.0519 LearningRate 0.0011 Epoch: 17 Global Step: 222380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:36,097-Speed 3294.98 samples/sec Loss 1.0372 LearningRate 0.0011 Epoch: 17 Global Step: 222390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:01:39,220-Speed 3279.62 samples/sec Loss 1.0520 LearningRate 0.0011 Epoch: 17 Global Step: 222400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:01:42,445-Speed 3177.15 samples/sec Loss 1.0499 LearningRate 0.0011 Epoch: 17 Global Step: 222410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:01:45,505-Speed 3347.35 samples/sec Loss 1.0424 LearningRate 0.0011 Epoch: 17 Global Step: 222420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:48,612-Speed 3296.67 samples/sec Loss 1.0589 LearningRate 0.0011 Epoch: 17 Global Step: 222430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:51,740-Speed 3275.56 samples/sec Loss 1.0284 LearningRate 0.0011 Epoch: 17 Global Step: 222440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:54,890-Speed 3250.93 samples/sec Loss 1.0441 LearningRate 0.0011 Epoch: 17 Global Step: 222450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:01:57,984-Speed 3310.93 samples/sec Loss 1.0284 LearningRate 0.0011 Epoch: 17 Global Step: 222460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:02:01,090-Speed 3297.81 samples/sec Loss 1.0577 LearningRate 0.0011 Epoch: 17 Global Step: 222470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:02:04,225-Speed 3267.41 samples/sec Loss 1.0456 LearningRate 0.0011 Epoch: 17 Global Step: 222480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:02:07,329-Speed 3300.76 samples/sec Loss 1.0451 LearningRate 0.0011 Epoch: 17 Global Step: 222490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:02:10,429-Speed 3303.52 samples/sec Loss 1.0339 LearningRate 0.0011 Epoch: 17 Global Step: 222500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:02:13,556-Speed 3275.75 samples/sec Loss 1.0726 LearningRate 0.0011 Epoch: 17 Global Step: 222510 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:02:16,619-Speed 3344.73 samples/sec Loss 1.0834 LearningRate 0.0011 Epoch: 17 Global Step: 222520 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:02:19,733-Speed 3289.48 samples/sec Loss 1.0406 LearningRate 0.0011 Epoch: 17 Global Step: 222530 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:02:22,821-Speed 3316.76 samples/sec Loss 1.0451 LearningRate 0.0011 Epoch: 17 Global Step: 222540 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:02:25,922-Speed 3303.13 samples/sec Loss 1.0300 LearningRate 0.0011 Epoch: 17 Global Step: 222550 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:02:29,074-Speed 3249.50 samples/sec Loss 1.0261 LearningRate 0.0011 Epoch: 17 Global Step: 222560 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:02:32,147-Speed 3333.22 samples/sec Loss 1.0623 LearningRate 0.0011 Epoch: 17 Global Step: 222570 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:02:35,229-Speed 3324.34 samples/sec Loss 1.0891 LearningRate 0.0011 Epoch: 17 Global Step: 222580 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:02:38,356-Speed 3276.26 samples/sec Loss 1.0522 LearningRate 0.0011 Epoch: 17 Global Step: 222590 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:02:41,415-Speed 3348.19 samples/sec Loss 1.0360 LearningRate 0.0011 Epoch: 17 Global Step: 222600 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:02:44,533-Speed 3286.04 samples/sec Loss 1.0494 LearningRate 0.0011 Epoch: 17 Global Step: 222610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:02:47,624-Speed 3313.31 samples/sec Loss 1.0413 LearningRate 0.0011 Epoch: 17 Global Step: 222620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:02:50,813-Speed 3212.70 samples/sec Loss 0.9841 LearningRate 0.0011 Epoch: 17 Global Step: 222630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:02:53,979-Speed 3234.72 samples/sec Loss 1.0313 LearningRate 0.0011 Epoch: 17 Global Step: 222640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:02:57,031-Speed 3356.36 samples/sec Loss 1.0812 LearningRate 0.0011 Epoch: 17 Global Step: 222650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:00,177-Speed 3256.05 samples/sec Loss 1.0482 LearningRate 0.0011 Epoch: 17 Global Step: 222660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:03,275-Speed 3307.21 samples/sec Loss 1.0504 LearningRate 0.0011 Epoch: 17 Global Step: 222670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:06,388-Speed 3289.64 samples/sec Loss 0.9976 LearningRate 0.0011 Epoch: 17 Global Step: 222680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:09,480-Speed 3313.28 samples/sec Loss 1.0711 LearningRate 0.0011 Epoch: 17 Global Step: 222690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:12,583-Speed 3300.58 samples/sec Loss 1.0007 LearningRate 0.0011 Epoch: 17 Global Step: 222700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:15,696-Speed 3290.95 samples/sec Loss 1.0449 LearningRate 0.0011 Epoch: 17 Global Step: 222710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:03:18,756-Speed 3347.72 samples/sec Loss 1.0456 LearningRate 0.0011 Epoch: 17 Global Step: 222720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:21,857-Speed 3303.04 samples/sec Loss 1.0826 LearningRate 0.0011 Epoch: 17 Global Step: 222730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:24,919-Speed 3345.73 samples/sec Loss 1.0023 LearningRate 0.0011 Epoch: 17 Global Step: 222740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:28,087-Speed 3232.49 samples/sec Loss 1.0227 LearningRate 0.0011 Epoch: 17 Global Step: 222750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:31,239-Speed 3250.06 samples/sec Loss 1.0643 LearningRate 0.0011 Epoch: 17 Global Step: 222760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:34,336-Speed 3307.18 samples/sec Loss 1.0769 LearningRate 0.0011 Epoch: 17 Global Step: 222770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:37,512-Speed 3225.47 samples/sec Loss 1.0676 LearningRate 0.0011 Epoch: 17 Global Step: 222780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:03:40,617-Speed 3299.50 samples/sec Loss 1.0510 LearningRate 0.0011 Epoch: 17 Global Step: 222790 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:03:43,713-Speed 3308.80 samples/sec Loss 1.0283 LearningRate 0.0011 Epoch: 17 Global Step: 222800 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:03:46,786-Speed 3333.35 samples/sec Loss 1.0741 LearningRate 0.0011 Epoch: 17 Global Step: 222810 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:03:49,855-Speed 3336.91 samples/sec Loss 1.0240 LearningRate 0.0011 Epoch: 17 Global Step: 222820 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:03:52,957-Speed 3303.24 samples/sec Loss 1.0422 LearningRate 0.0011 Epoch: 17 Global Step: 222830 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:03:56,067-Speed 3293.02 samples/sec Loss 1.0259 LearningRate 0.0011 Epoch: 17 Global Step: 222840 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:03:59,163-Speed 3308.51 samples/sec Loss 0.9811 LearningRate 0.0011 Epoch: 17 Global Step: 222850 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:04:02,308-Speed 3257.26 samples/sec Loss 1.0773 LearningRate 0.0011 Epoch: 17 Global Step: 222860 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:04:05,471-Speed 3238.23 samples/sec Loss 1.0375 LearningRate 0.0011 Epoch: 17 Global Step: 222870 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:04:08,553-Speed 3323.04 samples/sec Loss 1.0385 LearningRate 0.0011 Epoch: 17 Global Step: 222880 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:04:11,641-Speed 3317.62 samples/sec Loss 1.0347 LearningRate 0.0011 Epoch: 17 Global Step: 222890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:14,716-Speed 3331.55 samples/sec Loss 1.0724 LearningRate 0.0011 Epoch: 17 Global Step: 222900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:17,778-Speed 3344.75 samples/sec Loss 1.0193 LearningRate 0.0011 Epoch: 17 Global Step: 222910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:20,858-Speed 3325.94 samples/sec Loss 1.0497 LearningRate 0.0011 Epoch: 17 Global Step: 222920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:23,975-Speed 3286.78 samples/sec Loss 1.0586 LearningRate 0.0011 Epoch: 17 Global Step: 222930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:27,164-Speed 3211.57 samples/sec Loss 1.0450 LearningRate 0.0011 Epoch: 17 Global Step: 222940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:30,301-Speed 3264.98 samples/sec Loss 1.0289 LearningRate 0.0011 Epoch: 17 Global Step: 222950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:33,365-Speed 3343.29 samples/sec Loss 1.0498 LearningRate 0.0011 Epoch: 17 Global Step: 222960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:36,507-Speed 3260.15 samples/sec Loss 1.0233 LearningRate 0.0010 Epoch: 17 Global Step: 222970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:39,592-Speed 3320.35 samples/sec Loss 1.0154 LearningRate 0.0010 Epoch: 17 Global Step: 222980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:42,816-Speed 3177.19 samples/sec Loss 1.0179 LearningRate 0.0010 Epoch: 17 Global Step: 222990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:04:45,888-Speed 3334.54 samples/sec Loss 1.0477 LearningRate 0.0010 Epoch: 17 Global Step: 223000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:04:48,932-Speed 3365.31 samples/sec Loss 0.9990 LearningRate 0.0010 Epoch: 17 Global Step: 223010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:52,023-Speed 3314.40 samples/sec Loss 1.0154 LearningRate 0.0010 Epoch: 17 Global Step: 223020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:55,146-Speed 3279.32 samples/sec Loss 1.0252 LearningRate 0.0010 Epoch: 17 Global Step: 223030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:04:58,232-Speed 3319.62 samples/sec Loss 1.0574 LearningRate 0.0010 Epoch: 17 Global Step: 223040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:05:01,288-Speed 3351.64 samples/sec Loss 1.0217 LearningRate 0.0010 Epoch: 17 Global Step: 223050 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:04,381-Speed 3312.17 samples/sec Loss 1.0699 LearningRate 0.0010 Epoch: 17 Global Step: 223060 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:07,472-Speed 3313.25 samples/sec Loss 1.0129 LearningRate 0.0010 Epoch: 17 Global Step: 223070 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:10,543-Speed 3335.60 samples/sec Loss 1.0165 LearningRate 0.0010 Epoch: 17 Global Step: 223080 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:13,689-Speed 3256.18 samples/sec Loss 1.0667 LearningRate 0.0010 Epoch: 17 Global Step: 223090 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:16,771-Speed 3323.74 samples/sec Loss 1.0307 LearningRate 0.0010 Epoch: 17 Global Step: 223100 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:19,854-Speed 3322.47 samples/sec Loss 1.0306 LearningRate 0.0010 Epoch: 17 Global Step: 223110 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:22,966-Speed 3291.74 samples/sec Loss 0.9923 LearningRate 0.0010 Epoch: 17 Global Step: 223120 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:26,141-Speed 3225.61 samples/sec Loss 1.0256 LearningRate 0.0010 Epoch: 17 Global Step: 223130 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:29,202-Speed 3346.63 samples/sec Loss 1.0310 LearningRate 0.0010 Epoch: 17 Global Step: 223140 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:32,344-Speed 3261.75 samples/sec Loss 1.0340 LearningRate 0.0010 Epoch: 17 Global Step: 223150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:05:35,451-Speed 3296.48 samples/sec Loss 1.0490 LearningRate 0.0010 Epoch: 17 Global Step: 223160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:05:38,594-Speed 3259.45 samples/sec Loss 1.0224 LearningRate 0.0010 Epoch: 17 Global Step: 223170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:05:41,695-Speed 3302.30 samples/sec Loss 1.0217 LearningRate 0.0010 Epoch: 17 Global Step: 223180 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:44,808-Speed 3290.49 samples/sec Loss 1.0580 LearningRate 0.0010 Epoch: 17 Global Step: 223190 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:47,906-Speed 3306.54 samples/sec Loss 1.0563 LearningRate 0.0010 Epoch: 17 Global Step: 223200 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:51,023-Speed 3286.87 samples/sec Loss 1.0879 LearningRate 0.0010 Epoch: 17 Global Step: 223210 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:54,187-Speed 3237.37 samples/sec Loss 1.0101 LearningRate 0.0010 Epoch: 17 Global Step: 223220 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:05:57,280-Speed 3311.81 samples/sec Loss 1.0727 LearningRate 0.0010 Epoch: 17 Global Step: 223230 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:06:00,355-Speed 3330.34 samples/sec Loss 1.0180 LearningRate 0.0010 Epoch: 17 Global Step: 223240 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:06:03,443-Speed 3317.72 samples/sec Loss 1.0509 LearningRate 0.0010 Epoch: 17 Global Step: 223250 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:06:06,539-Speed 3308.72 samples/sec Loss 1.0022 LearningRate 0.0010 Epoch: 17 Global Step: 223260 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:06:09,645-Speed 3297.52 samples/sec Loss 1.0384 LearningRate 0.0010 Epoch: 17 Global Step: 223270 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:06:12,806-Speed 3240.42 samples/sec Loss 1.0321 LearningRate 0.0010 Epoch: 17 Global Step: 223280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:06:15,916-Speed 3294.94 samples/sec Loss 1.0345 LearningRate 0.0010 Epoch: 17 Global Step: 223290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:06:19,067-Speed 3250.33 samples/sec Loss 1.0160 LearningRate 0.0010 Epoch: 17 Global Step: 223300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:06:22,137-Speed 3336.30 samples/sec Loss 1.1052 LearningRate 0.0010 Epoch: 17 Global Step: 223310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:06:25,232-Speed 3309.58 samples/sec Loss 1.0084 LearningRate 0.0010 Epoch: 17 Global Step: 223320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:06:28,365-Speed 3270.30 samples/sec Loss 1.0071 LearningRate 0.0010 Epoch: 17 Global Step: 223330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:06:31,438-Speed 3333.22 samples/sec Loss 1.0566 LearningRate 0.0010 Epoch: 17 Global Step: 223340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:06:34,577-Speed 3263.21 samples/sec Loss 1.0328 LearningRate 0.0010 Epoch: 17 Global Step: 223350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:06:37,679-Speed 3302.47 samples/sec Loss 1.0489 LearningRate 0.0010 Epoch: 17 Global Step: 223360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:06:40,734-Speed 3352.88 samples/sec Loss 1.0290 LearningRate 0.0010 Epoch: 17 Global Step: 223370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:06:43,832-Speed 3306.13 samples/sec Loss 1.0469 LearningRate 0.0010 Epoch: 17 Global Step: 223380 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:06:46,917-Speed 3320.66 samples/sec Loss 1.0542 LearningRate 0.0010 Epoch: 17 Global Step: 223390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:06:49,987-Speed 3336.46 samples/sec Loss 1.0040 LearningRate 0.0010 Epoch: 17 Global Step: 223400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:06:53,126-Speed 3263.33 samples/sec Loss 1.0270 LearningRate 0.0010 Epoch: 17 Global Step: 223410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:06:56,223-Speed 3307.63 samples/sec Loss 1.0373 LearningRate 0.0010 Epoch: 17 Global Step: 223420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:06:59,289-Speed 3341.26 samples/sec Loss 1.0613 LearningRate 0.0010 Epoch: 17 Global Step: 223430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:07:02,379-Speed 3314.70 samples/sec Loss 1.0988 LearningRate 0.0010 Epoch: 17 Global Step: 223440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:07:05,446-Speed 3340.45 samples/sec Loss 1.0681 LearningRate 0.0010 Epoch: 17 Global Step: 223450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:07:08,552-Speed 3297.83 samples/sec Loss 1.0248 LearningRate 0.0010 Epoch: 17 Global Step: 223460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:07:11,642-Speed 3314.57 samples/sec Loss 1.0262 LearningRate 0.0010 Epoch: 17 Global Step: 223470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:07:14,762-Speed 3282.65 samples/sec Loss 1.0132 LearningRate 0.0010 Epoch: 17 Global Step: 223480 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:07:17,929-Speed 3235.08 samples/sec Loss 0.9918 LearningRate 0.0010 Epoch: 17 Global Step: 223490 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:07:21,023-Speed 3310.26 samples/sec Loss 1.0265 LearningRate 0.0010 Epoch: 17 Global Step: 223500 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:07:24,133-Speed 3293.95 samples/sec Loss 1.0201 LearningRate 0.0010 Epoch: 17 Global Step: 223510 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:07:27,211-Speed 3327.95 samples/sec Loss 1.0034 LearningRate 0.0010 Epoch: 17 Global Step: 223520 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:07:30,365-Speed 3248.16 samples/sec Loss 1.0365 LearningRate 0.0010 Epoch: 17 Global Step: 223530 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:07:33,506-Speed 3260.62 samples/sec Loss 1.0433 LearningRate 0.0010 Epoch: 17 Global Step: 223540 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:07:36,574-Speed 3338.78 samples/sec Loss 1.0351 LearningRate 0.0010 Epoch: 17 Global Step: 223550 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:07:39,710-Speed 3266.17 samples/sec Loss 1.0136 LearningRate 0.0010 Epoch: 17 Global Step: 223560 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:07:43,008-Speed 3105.88 samples/sec Loss 1.0477 LearningRate 0.0010 Epoch: 17 Global Step: 223570 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:08:15,413-Speed 316.02 samples/sec Loss 1.0138 LearningRate 0.0010 Epoch: 18 Global Step: 223580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:18,526-Speed 3290.69 samples/sec Loss 0.8162 LearningRate 0.0010 Epoch: 18 Global Step: 223590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:21,820-Speed 3109.56 samples/sec Loss 0.8567 LearningRate 0.0010 Epoch: 18 Global Step: 223600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:24,924-Speed 3299.95 samples/sec Loss 0.8893 LearningRate 0.0010 Epoch: 18 Global Step: 223610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:28,057-Speed 3269.52 samples/sec Loss 0.8859 LearningRate 0.0010 Epoch: 18 Global Step: 223620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:31,150-Speed 3311.89 samples/sec Loss 0.8467 LearningRate 0.0010 Epoch: 18 Global Step: 223630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:34,251-Speed 3303.47 samples/sec Loss 0.8408 LearningRate 0.0010 Epoch: 18 Global Step: 223640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:37,376-Speed 3278.53 samples/sec Loss 0.8468 LearningRate 0.0010 Epoch: 18 Global Step: 223650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:40,510-Speed 3268.28 samples/sec Loss 0.8317 LearningRate 0.0010 Epoch: 18 Global Step: 223660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:43,643-Speed 3268.80 samples/sec Loss 0.8397 LearningRate 0.0010 Epoch: 18 Global Step: 223670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:46,830-Speed 3214.91 samples/sec Loss 0.8295 LearningRate 0.0010 Epoch: 18 Global Step: 223680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:08:49,914-Speed 3321.37 samples/sec Loss 0.8296 LearningRate 0.0010 Epoch: 18 Global Step: 223690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:08:53,023-Speed 3294.50 samples/sec Loss 0.8791 LearningRate 0.0010 Epoch: 18 Global Step: 223700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:56,114-Speed 3313.78 samples/sec Loss 0.8525 LearningRate 0.0010 Epoch: 18 Global Step: 223710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:08:59,218-Speed 3299.54 samples/sec Loss 0.8850 LearningRate 0.0010 Epoch: 18 Global Step: 223720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:02,380-Speed 3240.48 samples/sec Loss 0.8934 LearningRate 0.0010 Epoch: 18 Global Step: 223730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:05,467-Speed 3317.54 samples/sec Loss 0.8024 LearningRate 0.0010 Epoch: 18 Global Step: 223740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:08,535-Speed 3338.69 samples/sec Loss 0.8434 LearningRate 0.0010 Epoch: 18 Global Step: 223750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:11,610-Speed 3331.59 samples/sec Loss 0.8840 LearningRate 0.0010 Epoch: 18 Global Step: 223760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:14,682-Speed 3334.93 samples/sec Loss 0.8895 LearningRate 0.0010 Epoch: 18 Global Step: 223770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:17,813-Speed 3270.33 samples/sec Loss 0.8685 LearningRate 0.0010 Epoch: 18 Global Step: 223780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:20,925-Speed 3291.49 samples/sec Loss 0.8430 LearningRate 0.0010 Epoch: 18 Global Step: 223790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:24,043-Speed 3285.28 samples/sec Loss 0.8320 LearningRate 0.0010 Epoch: 18 Global Step: 223800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:09:27,121-Speed 3328.76 samples/sec Loss 0.8447 LearningRate 0.0010 Epoch: 18 Global Step: 223810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:30,271-Speed 3250.96 samples/sec Loss 0.8803 LearningRate 0.0010 Epoch: 18 Global Step: 223820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:33,360-Speed 3317.21 samples/sec Loss 0.8038 LearningRate 0.0010 Epoch: 18 Global Step: 223830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:36,468-Speed 3295.19 samples/sec Loss 0.8420 LearningRate 0.0010 Epoch: 18 Global Step: 223840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:39,582-Speed 3290.01 samples/sec Loss 0.8385 LearningRate 0.0010 Epoch: 18 Global Step: 223850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:42,699-Speed 3285.36 samples/sec Loss 0.8393 LearningRate 0.0010 Epoch: 18 Global Step: 223860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:45,870-Speed 3230.58 samples/sec Loss 0.8477 LearningRate 0.0010 Epoch: 18 Global Step: 223870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:49,204-Speed 3072.40 samples/sec Loss 0.8376 LearningRate 0.0010 Epoch: 18 Global Step: 223880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:52,341-Speed 3265.19 samples/sec Loss 0.8372 LearningRate 0.0010 Epoch: 18 Global Step: 223890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:55,478-Speed 3265.80 samples/sec Loss 0.8495 LearningRate 0.0010 Epoch: 18 Global Step: 223900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:09:58,535-Speed 3350.10 samples/sec Loss 0.8418 LearningRate 0.0010 Epoch: 18 Global Step: 223910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:10:01,624-Speed 3316.67 samples/sec Loss 0.8502 LearningRate 0.0010 Epoch: 18 Global Step: 223920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:10:04,708-Speed 3320.97 samples/sec Loss 0.8375 LearningRate 0.0010 Epoch: 18 Global Step: 223930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:07,805-Speed 3307.41 samples/sec Loss 0.8270 LearningRate 0.0010 Epoch: 18 Global Step: 223940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:10,908-Speed 3301.15 samples/sec Loss 0.8325 LearningRate 0.0010 Epoch: 18 Global Step: 223950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:14,043-Speed 3267.64 samples/sec Loss 0.8313 LearningRate 0.0010 Epoch: 18 Global Step: 223960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:17,124-Speed 3324.30 samples/sec Loss 0.8589 LearningRate 0.0010 Epoch: 18 Global Step: 223970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:20,182-Speed 3349.84 samples/sec Loss 0.8183 LearningRate 0.0010 Epoch: 18 Global Step: 223980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:23,289-Speed 3296.42 samples/sec Loss 0.8435 LearningRate 0.0010 Epoch: 18 Global Step: 223990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:26,374-Speed 3321.17 samples/sec Loss 0.8011 LearningRate 0.0010 Epoch: 18 Global Step: 224000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:29,464-Speed 3315.13 samples/sec Loss 0.8643 LearningRate 0.0010 Epoch: 18 Global Step: 224010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:32,552-Speed 3317.02 samples/sec Loss 0.8514 LearningRate 0.0010 Epoch: 18 Global Step: 224020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:35,638-Speed 3319.34 samples/sec Loss 0.8040 LearningRate 0.0010 Epoch: 18 Global Step: 224030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:10:38,740-Speed 3301.68 samples/sec Loss 0.8526 LearningRate 0.0010 Epoch: 18 Global Step: 224040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:41,821-Speed 3324.95 samples/sec Loss 0.8211 LearningRate 0.0010 Epoch: 18 Global Step: 224050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:44,896-Speed 3331.31 samples/sec Loss 0.8569 LearningRate 0.0010 Epoch: 18 Global Step: 224060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:48,084-Speed 3213.05 samples/sec Loss 0.8448 LearningRate 0.0010 Epoch: 18 Global Step: 224070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:51,238-Speed 3247.47 samples/sec Loss 0.8241 LearningRate 0.0010 Epoch: 18 Global Step: 224080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:54,410-Speed 3229.56 samples/sec Loss 0.8389 LearningRate 0.0010 Epoch: 18 Global Step: 224090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:10:57,498-Speed 3316.59 samples/sec Loss 0.8168 LearningRate 0.0010 Epoch: 18 Global Step: 224100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:00,632-Speed 3268.93 samples/sec Loss 0.8293 LearningRate 0.0010 Epoch: 18 Global Step: 224110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:03,725-Speed 3311.57 samples/sec Loss 0.8218 LearningRate 0.0010 Epoch: 18 Global Step: 224120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:06,866-Speed 3260.79 samples/sec Loss 0.8387 LearningRate 0.0010 Epoch: 18 Global Step: 224130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:09,935-Speed 3338.09 samples/sec Loss 0.8444 LearningRate 0.0010 Epoch: 18 Global Step: 224140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:11:13,131-Speed 3205.22 samples/sec Loss 0.8667 LearningRate 0.0010 Epoch: 18 Global Step: 224150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:11:16,286-Speed 3246.82 samples/sec Loss 0.8611 LearningRate 0.0010 Epoch: 18 Global Step: 224160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:11:19,346-Speed 3347.13 samples/sec Loss 0.8078 LearningRate 0.0010 Epoch: 18 Global Step: 224170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:11:22,405-Speed 3347.92 samples/sec Loss 0.8350 LearningRate 0.0010 Epoch: 18 Global Step: 224180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:11:25,510-Speed 3299.64 samples/sec Loss 0.8192 LearningRate 0.0010 Epoch: 18 Global Step: 224190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:28,653-Speed 3259.04 samples/sec Loss 0.8789 LearningRate 0.0010 Epoch: 18 Global Step: 224200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:31,765-Speed 3291.96 samples/sec Loss 0.8469 LearningRate 0.0009 Epoch: 18 Global Step: 224210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:34,873-Speed 3294.97 samples/sec Loss 0.8258 LearningRate 0.0009 Epoch: 18 Global Step: 224220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:37,937-Speed 3343.48 samples/sec Loss 0.8876 LearningRate 0.0009 Epoch: 18 Global Step: 224230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:41,076-Speed 3262.94 samples/sec Loss 0.8530 LearningRate 0.0009 Epoch: 18 Global Step: 224240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:44,244-Speed 3233.35 samples/sec Loss 0.8744 LearningRate 0.0009 Epoch: 18 Global Step: 224250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:47,307-Speed 3344.94 samples/sec Loss 0.8580 LearningRate 0.0009 Epoch: 18 Global Step: 224260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:50,387-Speed 3325.64 samples/sec Loss 0.8507 LearningRate 0.0009 Epoch: 18 Global Step: 224270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:53,562-Speed 3225.78 samples/sec Loss 0.8365 LearningRate 0.0009 Epoch: 18 Global Step: 224280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:11:56,650-Speed 3317.96 samples/sec Loss 0.8426 LearningRate 0.0009 Epoch: 18 Global Step: 224290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:11:59,742-Speed 3312.62 samples/sec Loss 0.8585 LearningRate 0.0009 Epoch: 18 Global Step: 224300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:12:02,857-Speed 3287.99 samples/sec Loss 0.8529 LearningRate 0.0009 Epoch: 18 Global Step: 224310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:12:05,969-Speed 3292.15 samples/sec Loss 0.8337 LearningRate 0.0009 Epoch: 18 Global Step: 224320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:12:09,023-Speed 3353.96 samples/sec Loss 0.8784 LearningRate 0.0009 Epoch: 18 Global Step: 224330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:12:12,120-Speed 3306.50 samples/sec Loss 0.8237 LearningRate 0.0009 Epoch: 18 Global Step: 224340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:12:15,211-Speed 3314.28 samples/sec Loss 0.8311 LearningRate 0.0009 Epoch: 18 Global Step: 224350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:12:18,310-Speed 3305.74 samples/sec Loss 0.8466 LearningRate 0.0009 Epoch: 18 Global Step: 224360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:12:21,380-Speed 3335.98 samples/sec Loss 0.8270 LearningRate 0.0009 Epoch: 18 Global Step: 224370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:12:24,485-Speed 3299.07 samples/sec Loss 0.8157 LearningRate 0.0009 Epoch: 18 Global Step: 224380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:12:27,595-Speed 3293.71 samples/sec Loss 0.8455 LearningRate 0.0009 Epoch: 18 Global Step: 224390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:12:30,728-Speed 3270.11 samples/sec Loss 0.8393 LearningRate 0.0009 Epoch: 18 Global Step: 224400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:12:33,814-Speed 3319.37 samples/sec Loss 0.8635 LearningRate 0.0009 Epoch: 18 Global Step: 224410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:12:36,941-Speed 3275.21 samples/sec Loss 0.8331 LearningRate 0.0009 Epoch: 18 Global Step: 224420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:12:40,047-Speed 3297.68 samples/sec Loss 0.8513 LearningRate 0.0009 Epoch: 18 Global Step: 224430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:12:43,164-Speed 3286.68 samples/sec Loss 0.8364 LearningRate 0.0009 Epoch: 18 Global Step: 224440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:12:46,257-Speed 3311.44 samples/sec Loss 0.8321 LearningRate 0.0009 Epoch: 18 Global Step: 224450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:12:49,330-Speed 3333.84 samples/sec Loss 0.8499 LearningRate 0.0009 Epoch: 18 Global Step: 224460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:12:52,422-Speed 3312.64 samples/sec Loss 0.8754 LearningRate 0.0009 Epoch: 18 Global Step: 224470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:12:55,540-Speed 3285.41 samples/sec Loss 0.8546 LearningRate 0.0009 Epoch: 18 Global Step: 224480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:12:58,618-Speed 3327.58 samples/sec Loss 0.8103 LearningRate 0.0009 Epoch: 18 Global Step: 224490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:13:01,757-Speed 3262.65 samples/sec Loss 0.8133 LearningRate 0.0009 Epoch: 18 Global Step: 224500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:13:04,888-Speed 3271.54 samples/sec Loss 0.8465 LearningRate 0.0009 Epoch: 18 Global Step: 224510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:13:07,978-Speed 3315.59 samples/sec Loss 0.8486 LearningRate 0.0009 Epoch: 18 Global Step: 224520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:13:11,055-Speed 3328.99 samples/sec Loss 0.8809 LearningRate 0.0009 Epoch: 18 Global Step: 224530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:13:14,205-Speed 3251.71 samples/sec Loss 0.8299 LearningRate 0.0009 Epoch: 18 Global Step: 224540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:13:17,324-Speed 3283.60 samples/sec Loss 0.8514 LearningRate 0.0009 Epoch: 18 Global Step: 224550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:13:20,494-Speed 3231.17 samples/sec Loss 0.8903 LearningRate 0.0009 Epoch: 18 Global Step: 224560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:13:23,609-Speed 3289.15 samples/sec Loss 0.8657 LearningRate 0.0009 Epoch: 18 Global Step: 224570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:13:26,773-Speed 3237.19 samples/sec Loss 0.8502 LearningRate 0.0009 Epoch: 18 Global Step: 224580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:13:29,912-Speed 3263.30 samples/sec Loss 0.8636 LearningRate 0.0009 Epoch: 18 Global Step: 224590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:13:32,973-Speed 3345.50 samples/sec Loss 0.8503 LearningRate 0.0009 Epoch: 18 Global Step: 224600 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:13:36,068-Speed 3310.14 samples/sec Loss 0.8346 LearningRate 0.0009 Epoch: 18 Global Step: 224610 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:13:39,248-Speed 3221.01 samples/sec Loss 0.8153 LearningRate 0.0009 Epoch: 18 Global Step: 224620 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:13:42,372-Speed 3278.97 samples/sec Loss 0.8553 LearningRate 0.0009 Epoch: 18 Global Step: 224630 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:13:45,470-Speed 3307.32 samples/sec Loss 0.8536 LearningRate 0.0009 Epoch: 18 Global Step: 224640 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:13:48,632-Speed 3238.73 samples/sec Loss 0.8678 LearningRate 0.0009 Epoch: 18 Global Step: 224650 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:13:51,777-Speed 3257.29 samples/sec Loss 0.8325 LearningRate 0.0009 Epoch: 18 Global Step: 224660 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:13:54,935-Speed 3246.82 samples/sec Loss 0.8516 LearningRate 0.0009 Epoch: 18 Global Step: 224670 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:13:57,997-Speed 3345.55 samples/sec Loss 0.8895 LearningRate 0.0009 Epoch: 18 Global Step: 224680 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:14:01,095-Speed 3305.47 samples/sec Loss 0.8427 LearningRate 0.0009 Epoch: 18 Global Step: 224690 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:14:04,231-Speed 3267.32 samples/sec Loss 0.8454 LearningRate 0.0009 Epoch: 18 Global Step: 224700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:07,394-Speed 3238.40 samples/sec Loss 0.8725 LearningRate 0.0009 Epoch: 18 Global Step: 224710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:10,509-Speed 3287.71 samples/sec Loss 0.8568 LearningRate 0.0009 Epoch: 18 Global Step: 224720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:13,633-Speed 3279.49 samples/sec Loss 0.8498 LearningRate 0.0009 Epoch: 18 Global Step: 224730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:16,810-Speed 3224.06 samples/sec Loss 0.8210 LearningRate 0.0009 Epoch: 18 Global Step: 224740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:19,916-Speed 3298.61 samples/sec Loss 0.8078 LearningRate 0.0009 Epoch: 18 Global Step: 224750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:22,975-Speed 3349.05 samples/sec Loss 0.8476 LearningRate 0.0009 Epoch: 18 Global Step: 224760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:26,195-Speed 3181.57 samples/sec Loss 0.8564 LearningRate 0.0009 Epoch: 18 Global Step: 224770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:29,330-Speed 3266.85 samples/sec Loss 0.8315 LearningRate 0.0009 Epoch: 18 Global Step: 224780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:32,449-Speed 3283.61 samples/sec Loss 0.8832 LearningRate 0.0009 Epoch: 18 Global Step: 224790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:35,535-Speed 3320.27 samples/sec Loss 0.8343 LearningRate 0.0009 Epoch: 18 Global Step: 224800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:14:38,681-Speed 3255.51 samples/sec Loss 0.8657 LearningRate 0.0009 Epoch: 18 Global Step: 224810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:14:41,818-Speed 3265.34 samples/sec Loss 0.8608 LearningRate 0.0009 Epoch: 18 Global Step: 224820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:14:44,858-Speed 3369.12 samples/sec Loss 0.8813 LearningRate 0.0009 Epoch: 18 Global Step: 224830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:47,978-Speed 3283.35 samples/sec Loss 0.8803 LearningRate 0.0009 Epoch: 18 Global Step: 224840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:51,131-Speed 3248.47 samples/sec Loss 0.8498 LearningRate 0.0009 Epoch: 18 Global Step: 224850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:54,222-Speed 3313.83 samples/sec Loss 0.8445 LearningRate 0.0009 Epoch: 18 Global Step: 224860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:14:57,290-Speed 3338.95 samples/sec Loss 0.8875 LearningRate 0.0009 Epoch: 18 Global Step: 224870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:00,386-Speed 3308.39 samples/sec Loss 0.8476 LearningRate 0.0009 Epoch: 18 Global Step: 224880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:03,459-Speed 3334.11 samples/sec Loss 0.8553 LearningRate 0.0009 Epoch: 18 Global Step: 224890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:06,720-Speed 3140.66 samples/sec Loss 0.8479 LearningRate 0.0009 Epoch: 18 Global Step: 224900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:09,799-Speed 3326.68 samples/sec Loss 0.8325 LearningRate 0.0009 Epoch: 18 Global Step: 224910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:12,993-Speed 3207.28 samples/sec Loss 0.7873 LearningRate 0.0009 Epoch: 18 Global Step: 224920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:16,176-Speed 3218.19 samples/sec Loss 0.8528 LearningRate 0.0009 Epoch: 18 Global Step: 224930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:15:19,359-Speed 3217.85 samples/sec Loss 0.8399 LearningRate 0.0009 Epoch: 18 Global Step: 224940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:15:22,417-Speed 3349.25 samples/sec Loss 0.8799 LearningRate 0.0009 Epoch: 18 Global Step: 224950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:25,541-Speed 3279.41 samples/sec Loss 0.8704 LearningRate 0.0009 Epoch: 18 Global Step: 224960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:28,709-Speed 3232.55 samples/sec Loss 0.8513 LearningRate 0.0009 Epoch: 18 Global Step: 224970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:31,877-Speed 3234.06 samples/sec Loss 0.8159 LearningRate 0.0009 Epoch: 18 Global Step: 224980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:35,001-Speed 3277.98 samples/sec Loss 0.8344 LearningRate 0.0009 Epoch: 18 Global Step: 224990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:38,139-Speed 3264.62 samples/sec Loss 0.8466 LearningRate 0.0009 Epoch: 18 Global Step: 225000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:41,279-Speed 3263.10 samples/sec Loss 0.8565 LearningRate 0.0009 Epoch: 18 Global Step: 225010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:44,352-Speed 3332.82 samples/sec Loss 0.8324 LearningRate 0.0009 Epoch: 18 Global Step: 225020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:47,460-Speed 3295.20 samples/sec Loss 0.7852 LearningRate 0.0009 Epoch: 18 Global Step: 225030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:50,617-Speed 3244.66 samples/sec Loss 0.8675 LearningRate 0.0009 Epoch: 18 Global Step: 225040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:53,758-Speed 3261.07 samples/sec Loss 0.8291 LearningRate 0.0009 Epoch: 18 Global Step: 225050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:15:56,815-Speed 3351.99 samples/sec Loss 0.8271 LearningRate 0.0009 Epoch: 18 Global Step: 225060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:15:59,906-Speed 3313.23 samples/sec Loss 0.8828 LearningRate 0.0009 Epoch: 18 Global Step: 225070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:03,154-Speed 3153.83 samples/sec Loss 0.8425 LearningRate 0.0009 Epoch: 18 Global Step: 225080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:06,363-Speed 3191.82 samples/sec Loss 0.7757 LearningRate 0.0009 Epoch: 18 Global Step: 225090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:09,504-Speed 3260.92 samples/sec Loss 0.8626 LearningRate 0.0009 Epoch: 18 Global Step: 225100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:12,628-Speed 3279.58 samples/sec Loss 0.8333 LearningRate 0.0009 Epoch: 18 Global Step: 225110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:15,822-Speed 3207.20 samples/sec Loss 0.8445 LearningRate 0.0009 Epoch: 18 Global Step: 225120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:18,937-Speed 3288.15 samples/sec Loss 0.8530 LearningRate 0.0009 Epoch: 18 Global Step: 225130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:22,004-Speed 3340.02 samples/sec Loss 0.8570 LearningRate 0.0009 Epoch: 18 Global Step: 225140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:25,099-Speed 3310.05 samples/sec Loss 0.8653 LearningRate 0.0009 Epoch: 18 Global Step: 225150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:28,222-Speed 3279.86 samples/sec Loss 0.8220 LearningRate 0.0009 Epoch: 18 Global Step: 225160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:16:31,320-Speed 3306.59 samples/sec Loss 0.8777 LearningRate 0.0009 Epoch: 18 Global Step: 225170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:34,395-Speed 3330.72 samples/sec Loss 0.8225 LearningRate 0.0009 Epoch: 18 Global Step: 225180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:37,596-Speed 3199.83 samples/sec Loss 0.8140 LearningRate 0.0009 Epoch: 18 Global Step: 225190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:40,753-Speed 3245.05 samples/sec Loss 0.8139 LearningRate 0.0009 Epoch: 18 Global Step: 225200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:43,868-Speed 3287.82 samples/sec Loss 0.8256 LearningRate 0.0009 Epoch: 18 Global Step: 225210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:47,068-Speed 3201.18 samples/sec Loss 0.8249 LearningRate 0.0009 Epoch: 18 Global Step: 225220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:50,235-Speed 3234.02 samples/sec Loss 0.8554 LearningRate 0.0009 Epoch: 18 Global Step: 225230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:53,437-Speed 3199.36 samples/sec Loss 0.8244 LearningRate 0.0009 Epoch: 18 Global Step: 225240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:56,540-Speed 3301.22 samples/sec Loss 0.8794 LearningRate 0.0009 Epoch: 18 Global Step: 225250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:16:59,683-Speed 3258.52 samples/sec Loss 0.8775 LearningRate 0.0009 Epoch: 18 Global Step: 225260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:17:02,834-Speed 3250.81 samples/sec Loss 0.8430 LearningRate 0.0009 Epoch: 18 Global Step: 225270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:17:06,009-Speed 3226.46 samples/sec Loss 0.8284 LearningRate 0.0009 Epoch: 18 Global Step: 225280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:17:09,117-Speed 3295.72 samples/sec Loss 0.8677 LearningRate 0.0009 Epoch: 18 Global Step: 225290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:17:12,147-Speed 3380.18 samples/sec Loss 0.8156 LearningRate 0.0009 Epoch: 18 Global Step: 225300 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:17:15,346-Speed 3201.81 samples/sec Loss 0.8275 LearningRate 0.0009 Epoch: 18 Global Step: 225310 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:17:18,432-Speed 3320.03 samples/sec Loss 0.7926 LearningRate 0.0009 Epoch: 18 Global Step: 225320 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:17:21,504-Speed 3335.07 samples/sec Loss 0.8366 LearningRate 0.0009 Epoch: 18 Global Step: 225330 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:17:24,673-Speed 3232.06 samples/sec Loss 0.8521 LearningRate 0.0009 Epoch: 18 Global Step: 225340 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:17:27,826-Speed 3248.15 samples/sec Loss 0.8559 LearningRate 0.0009 Epoch: 18 Global Step: 225350 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:17:30,974-Speed 3254.54 samples/sec Loss 0.8340 LearningRate 0.0009 Epoch: 18 Global Step: 225360 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:17:34,065-Speed 3313.19 samples/sec Loss 0.8658 LearningRate 0.0009 Epoch: 18 Global Step: 225370 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:17:37,218-Speed 3249.41 samples/sec Loss 0.8602 LearningRate 0.0009 Epoch: 18 Global Step: 225380 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:17:40,386-Speed 3232.31 samples/sec Loss 0.8420 LearningRate 0.0009 Epoch: 18 Global Step: 225390 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:17:43,530-Speed 3258.58 samples/sec Loss 0.8483 LearningRate 0.0009 Epoch: 18 Global Step: 225400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:17:46,615-Speed 3319.87 samples/sec Loss 0.8789 LearningRate 0.0009 Epoch: 18 Global Step: 225410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:17:49,691-Speed 3330.61 samples/sec Loss 0.8800 LearningRate 0.0009 Epoch: 18 Global Step: 225420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:17:52,775-Speed 3321.35 samples/sec Loss 0.8705 LearningRate 0.0009 Epoch: 18 Global Step: 225430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:17:55,851-Speed 3329.90 samples/sec Loss 0.8769 LearningRate 0.0009 Epoch: 18 Global Step: 225440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:17:58,927-Speed 3330.40 samples/sec Loss 0.8466 LearningRate 0.0009 Epoch: 18 Global Step: 225450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:18:02,051-Speed 3279.39 samples/sec Loss 0.8873 LearningRate 0.0009 Epoch: 18 Global Step: 225460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:18:05,155-Speed 3299.80 samples/sec Loss 0.8306 LearningRate 0.0009 Epoch: 18 Global Step: 225470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:18:08,233-Speed 3328.64 samples/sec Loss 0.8572 LearningRate 0.0009 Epoch: 18 Global Step: 225480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:18:11,308-Speed 3330.99 samples/sec Loss 0.8681 LearningRate 0.0009 Epoch: 18 Global Step: 225490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:18:14,370-Speed 3344.45 samples/sec Loss 0.8668 LearningRate 0.0009 Epoch: 18 Global Step: 225500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:18:17,543-Speed 3228.98 samples/sec Loss 0.8401 LearningRate 0.0009 Epoch: 18 Global Step: 225510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:18:20,577-Speed 3375.92 samples/sec Loss 0.8595 LearningRate 0.0008 Epoch: 18 Global Step: 225520 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:18:23,650-Speed 3333.60 samples/sec Loss 0.8318 LearningRate 0.0008 Epoch: 18 Global Step: 225530 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:18:26,732-Speed 3324.29 samples/sec Loss 0.7889 LearningRate 0.0008 Epoch: 18 Global Step: 225540 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:18:29,886-Speed 3247.05 samples/sec Loss 0.8315 LearningRate 0.0008 Epoch: 18 Global Step: 225550 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:18:32,992-Speed 3297.42 samples/sec Loss 0.8871 LearningRate 0.0008 Epoch: 18 Global Step: 225560 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:18:36,124-Speed 3271.25 samples/sec Loss 0.8451 LearningRate 0.0008 Epoch: 18 Global Step: 225570 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:18:39,287-Speed 3238.13 samples/sec Loss 0.8517 LearningRate 0.0008 Epoch: 18 Global Step: 225580 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:18:42,451-Speed 3237.72 samples/sec Loss 0.8660 LearningRate 0.0008 Epoch: 18 Global Step: 225590 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:18:45,542-Speed 3313.87 samples/sec Loss 0.8821 LearningRate 0.0008 Epoch: 18 Global Step: 225600 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:18:48,699-Speed 3243.85 samples/sec Loss 0.8839 LearningRate 0.0008 Epoch: 18 Global Step: 225610 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:18:51,826-Speed 3276.51 samples/sec Loss 0.8304 LearningRate 0.0008 Epoch: 18 Global Step: 225620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:18:54,995-Speed 3232.62 samples/sec Loss 0.8916 LearningRate 0.0008 Epoch: 18 Global Step: 225630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:18:58,109-Speed 3289.28 samples/sec Loss 0.8417 LearningRate 0.0008 Epoch: 18 Global Step: 225640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:19:01,301-Speed 3209.10 samples/sec Loss 0.8357 LearningRate 0.0008 Epoch: 18 Global Step: 225650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:19:04,494-Speed 3207.26 samples/sec Loss 0.8620 LearningRate 0.0008 Epoch: 18 Global Step: 225660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:19:07,627-Speed 3269.88 samples/sec Loss 0.8209 LearningRate 0.0008 Epoch: 18 Global Step: 225670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:19:10,691-Speed 3342.60 samples/sec Loss 0.8615 LearningRate 0.0008 Epoch: 18 Global Step: 225680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:19:13,915-Speed 3178.32 samples/sec Loss 0.8577 LearningRate 0.0008 Epoch: 18 Global Step: 225690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:19:17,028-Speed 3289.70 samples/sec Loss 0.8300 LearningRate 0.0008 Epoch: 18 Global Step: 225700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:19:20,130-Speed 3302.56 samples/sec Loss 0.8552 LearningRate 0.0008 Epoch: 18 Global Step: 225710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:19:23,267-Speed 3265.34 samples/sec Loss 0.8622 LearningRate 0.0008 Epoch: 18 Global Step: 225720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:19:26,367-Speed 3304.76 samples/sec Loss 0.8226 LearningRate 0.0008 Epoch: 18 Global Step: 225730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:19:29,472-Speed 3298.34 samples/sec Loss 0.8398 LearningRate 0.0008 Epoch: 18 Global Step: 225740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:19:32,565-Speed 3312.24 samples/sec Loss 0.8676 LearningRate 0.0008 Epoch: 18 Global Step: 225750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:19:35,681-Speed 3287.41 samples/sec Loss 0.8139 LearningRate 0.0008 Epoch: 18 Global Step: 225760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:19:38,834-Speed 3247.89 samples/sec Loss 0.8372 LearningRate 0.0008 Epoch: 18 Global Step: 225770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:19:42,580-Speed 2734.94 samples/sec Loss 0.8276 LearningRate 0.0008 Epoch: 18 Global Step: 225780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:19:45,696-Speed 3287.34 samples/sec Loss 0.8542 LearningRate 0.0008 Epoch: 18 Global Step: 225790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:19:48,775-Speed 3326.38 samples/sec Loss 0.8463 LearningRate 0.0008 Epoch: 18 Global Step: 225800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:19:51,873-Speed 3306.92 samples/sec Loss 0.8248 LearningRate 0.0008 Epoch: 18 Global Step: 225810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:19:55,019-Speed 3256.30 samples/sec Loss 0.8387 LearningRate 0.0008 Epoch: 18 Global Step: 225820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 21:19:58,114-Speed 3309.15 samples/sec Loss 0.8350 LearningRate 0.0008 Epoch: 18 Global Step: 225830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:20:01,215-Speed 3303.17 samples/sec Loss 0.8622 LearningRate 0.0008 Epoch: 18 Global Step: 225840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:20:04,305-Speed 3315.80 samples/sec Loss 0.8400 LearningRate 0.0008 Epoch: 18 Global Step: 225850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:20:07,417-Speed 3290.82 samples/sec Loss 0.8135 LearningRate 0.0008 Epoch: 18 Global Step: 225860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:20:10,478-Speed 3346.36 samples/sec Loss 0.8404 LearningRate 0.0008 Epoch: 18 Global Step: 225870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:20:13,557-Speed 3327.16 samples/sec Loss 0.8501 LearningRate 0.0008 Epoch: 18 Global Step: 225880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:20:16,812-Speed 3146.99 samples/sec Loss 0.8424 LearningRate 0.0008 Epoch: 18 Global Step: 225890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:20:19,922-Speed 3292.82 samples/sec Loss 0.8136 LearningRate 0.0008 Epoch: 18 Global Step: 225900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:20:23,014-Speed 3313.06 samples/sec Loss 0.7935 LearningRate 0.0008 Epoch: 18 Global Step: 225910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:20:26,166-Speed 3250.48 samples/sec Loss 0.8459 LearningRate 0.0008 Epoch: 18 Global Step: 225920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:20:29,285-Speed 3284.19 samples/sec Loss 0.8718 LearningRate 0.0008 Epoch: 18 Global Step: 225930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:20:32,391-Speed 3297.98 samples/sec Loss 0.8200 LearningRate 0.0008 Epoch: 18 Global Step: 225940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:20:35,462-Speed 3334.62 samples/sec Loss 0.8307 LearningRate 0.0008 Epoch: 18 Global Step: 225950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:20:38,619-Speed 3244.61 samples/sec Loss 0.8452 LearningRate 0.0008 Epoch: 18 Global Step: 225960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:20:41,808-Speed 3212.81 samples/sec Loss 0.8207 LearningRate 0.0008 Epoch: 18 Global Step: 225970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:20:44,896-Speed 3316.79 samples/sec Loss 0.8628 LearningRate 0.0008 Epoch: 18 Global Step: 225980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:20:47,992-Speed 3309.11 samples/sec Loss 0.8685 LearningRate 0.0008 Epoch: 18 Global Step: 225990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:20:51,110-Speed 3284.55 samples/sec Loss 0.8293 LearningRate 0.0008 Epoch: 18 Global Step: 226000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:20:54,227-Speed 3287.15 samples/sec Loss 0.8385 LearningRate 0.0008 Epoch: 18 Global Step: 226010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:20:57,286-Speed 3348.12 samples/sec Loss 0.8344 LearningRate 0.0008 Epoch: 18 Global Step: 226020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:00,425-Speed 3263.41 samples/sec Loss 0.8434 LearningRate 0.0008 Epoch: 18 Global Step: 226030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:03,518-Speed 3311.55 samples/sec Loss 0.8466 LearningRate 0.0008 Epoch: 18 Global Step: 226040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:06,625-Speed 3296.35 samples/sec Loss 0.8232 LearningRate 0.0008 Epoch: 18 Global Step: 226050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:09,685-Speed 3348.33 samples/sec Loss 0.8783 LearningRate 0.0008 Epoch: 18 Global Step: 226060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:12,853-Speed 3232.95 samples/sec Loss 0.8340 LearningRate 0.0008 Epoch: 18 Global Step: 226070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:16,647-Speed 2699.74 samples/sec Loss 0.8474 LearningRate 0.0008 Epoch: 18 Global Step: 226080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:19,761-Speed 3289.09 samples/sec Loss 0.8670 LearningRate 0.0008 Epoch: 18 Global Step: 226090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:22,861-Speed 3304.28 samples/sec Loss 0.8671 LearningRate 0.0008 Epoch: 18 Global Step: 226100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:26,692-Speed 2673.45 samples/sec Loss 0.8450 LearningRate 0.0008 Epoch: 18 Global Step: 226110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:21:30,351-Speed 2799.38 samples/sec Loss 0.8232 LearningRate 0.0008 Epoch: 18 Global Step: 226120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:21:33,414-Speed 3345.17 samples/sec Loss 0.8865 LearningRate 0.0008 Epoch: 18 Global Step: 226130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:21:36,516-Speed 3301.82 samples/sec Loss 0.8263 LearningRate 0.0008 Epoch: 18 Global Step: 226140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:21:39,631-Speed 3288.54 samples/sec Loss 0.8382 LearningRate 0.0008 Epoch: 18 Global Step: 226150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:42,788-Speed 3244.36 samples/sec Loss 0.8344 LearningRate 0.0008 Epoch: 18 Global Step: 226160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:45,862-Speed 3332.51 samples/sec Loss 0.8660 LearningRate 0.0008 Epoch: 18 Global Step: 226170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:48,964-Speed 3301.99 samples/sec Loss 0.8438 LearningRate 0.0008 Epoch: 18 Global Step: 226180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:52,036-Speed 3334.65 samples/sec Loss 0.8410 LearningRate 0.0008 Epoch: 18 Global Step: 226190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:55,223-Speed 3214.00 samples/sec Loss 0.8669 LearningRate 0.0008 Epoch: 18 Global Step: 226200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:21:58,299-Speed 3330.82 samples/sec Loss 0.8245 LearningRate 0.0008 Epoch: 18 Global Step: 226210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:01,494-Speed 3205.66 samples/sec Loss 0.8399 LearningRate 0.0008 Epoch: 18 Global Step: 226220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:04,669-Speed 3226.77 samples/sec Loss 0.8704 LearningRate 0.0008 Epoch: 18 Global Step: 226230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:07,761-Speed 3312.20 samples/sec Loss 0.8569 LearningRate 0.0008 Epoch: 18 Global Step: 226240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:10,822-Speed 3346.67 samples/sec Loss 0.8068 LearningRate 0.0008 Epoch: 18 Global Step: 226250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:22:13,892-Speed 3336.11 samples/sec Loss 0.8463 LearningRate 0.0008 Epoch: 18 Global Step: 226260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:16,974-Speed 3324.58 samples/sec Loss 0.8635 LearningRate 0.0008 Epoch: 18 Global Step: 226270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:20,047-Speed 3333.23 samples/sec Loss 0.8438 LearningRate 0.0008 Epoch: 18 Global Step: 226280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:23,150-Speed 3300.41 samples/sec Loss 0.8111 LearningRate 0.0008 Epoch: 18 Global Step: 226290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:26,322-Speed 3229.94 samples/sec Loss 0.8687 LearningRate 0.0008 Epoch: 18 Global Step: 226300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:29,510-Speed 3212.28 samples/sec Loss 0.8436 LearningRate 0.0008 Epoch: 18 Global Step: 226310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:32,681-Speed 3231.38 samples/sec Loss 0.8194 LearningRate 0.0008 Epoch: 18 Global Step: 226320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:35,801-Speed 3283.19 samples/sec Loss 0.8478 LearningRate 0.0008 Epoch: 18 Global Step: 226330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:38,883-Speed 3323.34 samples/sec Loss 0.8498 LearningRate 0.0008 Epoch: 18 Global Step: 226340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:42,022-Speed 3262.84 samples/sec Loss 0.8809 LearningRate 0.0008 Epoch: 18 Global Step: 226350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:45,091-Speed 3337.72 samples/sec Loss 0.8452 LearningRate 0.0008 Epoch: 18 Global Step: 226360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:48,188-Speed 3307.75 samples/sec Loss 0.8392 LearningRate 0.0008 Epoch: 18 Global Step: 226370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:51,262-Speed 3331.72 samples/sec Loss 0.8329 LearningRate 0.0008 Epoch: 18 Global Step: 226380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:54,363-Speed 3303.18 samples/sec Loss 0.8559 LearningRate 0.0008 Epoch: 18 Global Step: 226390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:22:57,438-Speed 3331.18 samples/sec Loss 0.8923 LearningRate 0.0008 Epoch: 18 Global Step: 226400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:23:00,554-Speed 3287.79 samples/sec Loss 0.8473 LearningRate 0.0008 Epoch: 18 Global Step: 226410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:23:03,617-Speed 3344.80 samples/sec Loss 0.8482 LearningRate 0.0008 Epoch: 18 Global Step: 226420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:23:06,728-Speed 3292.49 samples/sec Loss 0.8520 LearningRate 0.0008 Epoch: 18 Global Step: 226430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:23:09,808-Speed 3324.74 samples/sec Loss 0.8467 LearningRate 0.0008 Epoch: 18 Global Step: 226440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:23:12,956-Speed 3254.74 samples/sec Loss 0.8768 LearningRate 0.0008 Epoch: 18 Global Step: 226450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:23:16,084-Speed 3274.50 samples/sec Loss 0.8777 LearningRate 0.0008 Epoch: 18 Global Step: 226460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:23:19,169-Speed 3320.12 samples/sec Loss 0.8471 LearningRate 0.0008 Epoch: 18 Global Step: 226470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:23:22,224-Speed 3352.84 samples/sec Loss 0.8711 LearningRate 0.0008 Epoch: 18 Global Step: 226480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:23:25,282-Speed 3349.25 samples/sec Loss 0.8714 LearningRate 0.0008 Epoch: 18 Global Step: 226490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:23:28,380-Speed 3307.02 samples/sec Loss 0.8549 LearningRate 0.0008 Epoch: 18 Global Step: 226500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:23:31,442-Speed 3345.09 samples/sec Loss 0.8494 LearningRate 0.0008 Epoch: 18 Global Step: 226510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:23:34,531-Speed 3315.53 samples/sec Loss 0.8519 LearningRate 0.0008 Epoch: 18 Global Step: 226520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:23:37,628-Speed 3308.67 samples/sec Loss 0.8390 LearningRate 0.0008 Epoch: 18 Global Step: 226530 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:23:40,787-Speed 3242.29 samples/sec Loss 0.8013 LearningRate 0.0008 Epoch: 18 Global Step: 226540 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:23:43,922-Speed 3266.95 samples/sec Loss 0.8188 LearningRate 0.0008 Epoch: 18 Global Step: 226550 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:23:46,996-Speed 3332.69 samples/sec Loss 0.8284 LearningRate 0.0008 Epoch: 18 Global Step: 226560 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:23:50,049-Speed 3354.61 samples/sec Loss 0.8868 LearningRate 0.0008 Epoch: 18 Global Step: 226570 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:23:53,123-Speed 3331.97 samples/sec Loss 0.8500 LearningRate 0.0008 Epoch: 18 Global Step: 226580 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:23:56,200-Speed 3330.05 samples/sec Loss 0.8509 LearningRate 0.0008 Epoch: 18 Global Step: 226590 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:23:59,279-Speed 3326.52 samples/sec Loss 0.8466 LearningRate 0.0008 Epoch: 18 Global Step: 226600 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:02,395-Speed 3287.26 samples/sec Loss 0.8344 LearningRate 0.0008 Epoch: 18 Global Step: 226610 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:05,541-Speed 3256.01 samples/sec Loss 0.8904 LearningRate 0.0008 Epoch: 18 Global Step: 226620 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:08,613-Speed 3333.94 samples/sec Loss 0.8315 LearningRate 0.0008 Epoch: 18 Global Step: 226630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:24:11,661-Speed 3360.72 samples/sec Loss 0.8870 LearningRate 0.0008 Epoch: 18 Global Step: 226640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:24:14,797-Speed 3266.24 samples/sec Loss 0.8139 LearningRate 0.0008 Epoch: 18 Global Step: 226650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:24:17,932-Speed 3267.79 samples/sec Loss 0.8600 LearningRate 0.0008 Epoch: 18 Global Step: 226660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:24:21,003-Speed 3335.53 samples/sec Loss 0.8505 LearningRate 0.0008 Epoch: 18 Global Step: 226670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:24:24,120-Speed 3285.59 samples/sec Loss 0.8468 LearningRate 0.0008 Epoch: 18 Global Step: 226680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:24:27,200-Speed 3325.71 samples/sec Loss 0.8052 LearningRate 0.0008 Epoch: 18 Global Step: 226690 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:30,341-Speed 3261.32 samples/sec Loss 0.8252 LearningRate 0.0008 Epoch: 18 Global Step: 226700 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:33,462-Speed 3282.71 samples/sec Loss 0.8644 LearningRate 0.0008 Epoch: 18 Global Step: 226710 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:36,524-Speed 3344.77 samples/sec Loss 0.8401 LearningRate 0.0008 Epoch: 18 Global Step: 226720 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:39,598-Speed 3332.16 samples/sec Loss 0.8643 LearningRate 0.0008 Epoch: 18 Global Step: 226730 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:42,668-Speed 3337.06 samples/sec Loss 0.8859 LearningRate 0.0008 Epoch: 18 Global Step: 226740 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:45,747-Speed 3326.12 samples/sec Loss 0.8422 LearningRate 0.0008 Epoch: 18 Global Step: 226750 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:48,829-Speed 3323.79 samples/sec Loss 0.8622 LearningRate 0.0008 Epoch: 18 Global Step: 226760 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:51,975-Speed 3255.95 samples/sec Loss 0.8126 LearningRate 0.0008 Epoch: 18 Global Step: 226770 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:55,120-Speed 3256.61 samples/sec Loss 0.8926 LearningRate 0.0008 Epoch: 18 Global Step: 226780 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:24:58,236-Speed 3288.07 samples/sec Loss 0.8591 LearningRate 0.0008 Epoch: 18 Global Step: 226790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:01,327-Speed 3313.49 samples/sec Loss 0.8386 LearningRate 0.0008 Epoch: 18 Global Step: 226800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:04,434-Speed 3297.10 samples/sec Loss 0.8182 LearningRate 0.0008 Epoch: 18 Global Step: 226810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:07,507-Speed 3333.04 samples/sec Loss 0.8607 LearningRate 0.0008 Epoch: 18 Global Step: 226820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:10,618-Speed 3292.32 samples/sec Loss 0.8551 LearningRate 0.0008 Epoch: 18 Global Step: 226830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:13,761-Speed 3259.09 samples/sec Loss 0.8300 LearningRate 0.0008 Epoch: 18 Global Step: 226840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:16,911-Speed 3251.74 samples/sec Loss 0.8190 LearningRate 0.0008 Epoch: 18 Global Step: 226850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:19,983-Speed 3334.66 samples/sec Loss 0.8540 LearningRate 0.0008 Epoch: 18 Global Step: 226860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:23,044-Speed 3346.30 samples/sec Loss 0.8159 LearningRate 0.0008 Epoch: 18 Global Step: 226870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:26,188-Speed 3257.95 samples/sec Loss 0.8588 LearningRate 0.0008 Epoch: 18 Global Step: 226880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:29,415-Speed 3173.90 samples/sec Loss 0.8474 LearningRate 0.0008 Epoch: 18 Global Step: 226890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:25:32,516-Speed 3303.44 samples/sec Loss 0.8421 LearningRate 0.0008 Epoch: 18 Global Step: 226900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:25:35,602-Speed 3319.38 samples/sec Loss 0.8365 LearningRate 0.0007 Epoch: 18 Global Step: 226910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:25:38,646-Speed 3364.57 samples/sec Loss 0.8278 LearningRate 0.0007 Epoch: 18 Global Step: 226920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:41,779-Speed 3269.93 samples/sec Loss 0.8447 LearningRate 0.0007 Epoch: 18 Global Step: 226930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:44,867-Speed 3316.91 samples/sec Loss 0.8882 LearningRate 0.0007 Epoch: 18 Global Step: 226940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:48,084-Speed 3184.27 samples/sec Loss 0.8782 LearningRate 0.0007 Epoch: 18 Global Step: 226950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:51,216-Speed 3270.52 samples/sec Loss 0.8370 LearningRate 0.0007 Epoch: 18 Global Step: 226960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:54,394-Speed 3223.68 samples/sec Loss 0.8582 LearningRate 0.0007 Epoch: 18 Global Step: 226970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:25:57,477-Speed 3322.10 samples/sec Loss 0.8437 LearningRate 0.0007 Epoch: 18 Global Step: 226980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:26:00,570-Speed 3311.94 samples/sec Loss 0.8419 LearningRate 0.0007 Epoch: 18 Global Step: 226990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:26:03,685-Speed 3287.96 samples/sec Loss 0.8561 LearningRate 0.0007 Epoch: 18 Global Step: 227000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:26:06,763-Speed 3327.96 samples/sec Loss 0.7786 LearningRate 0.0007 Epoch: 18 Global Step: 227010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:26:09,858-Speed 3309.41 samples/sec Loss 0.8182 LearningRate 0.0007 Epoch: 18 Global Step: 227020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:26:13,036-Speed 3223.22 samples/sec Loss 0.8692 LearningRate 0.0007 Epoch: 18 Global Step: 227030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:26:16,111-Speed 3331.52 samples/sec Loss 0.8679 LearningRate 0.0007 Epoch: 18 Global Step: 227040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:26:19,232-Speed 3281.47 samples/sec Loss 0.8614 LearningRate 0.0007 Epoch: 18 Global Step: 227050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:26:22,280-Speed 3361.12 samples/sec Loss 0.8270 LearningRate 0.0007 Epoch: 18 Global Step: 227060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:26:25,485-Speed 3195.95 samples/sec Loss 0.8405 LearningRate 0.0007 Epoch: 18 Global Step: 227070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:26:28,626-Speed 3260.59 samples/sec Loss 0.8531 LearningRate 0.0007 Epoch: 18 Global Step: 227080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:26:31,707-Speed 3325.34 samples/sec Loss 0.8281 LearningRate 0.0007 Epoch: 18 Global Step: 227090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:26:34,871-Speed 3236.28 samples/sec Loss 0.8490 LearningRate 0.0007 Epoch: 18 Global Step: 227100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:26:38,013-Speed 3260.71 samples/sec Loss 0.8909 LearningRate 0.0007 Epoch: 18 Global Step: 227110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:26:41,077-Speed 3343.72 samples/sec Loss 0.8112 LearningRate 0.0007 Epoch: 18 Global Step: 227120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 21:26:44,168-Speed 3314.12 samples/sec Loss 0.8618 LearningRate 0.0007 Epoch: 18 Global Step: 227130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:26:47,233-Speed 3341.53 samples/sec Loss 0.8566 LearningRate 0.0007 Epoch: 18 Global Step: 227140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:26:50,290-Speed 3350.80 samples/sec Loss 0.8690 LearningRate 0.0007 Epoch: 18 Global Step: 227150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:26:53,423-Speed 3269.55 samples/sec Loss 0.7998 LearningRate 0.0007 Epoch: 18 Global Step: 227160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:26:56,576-Speed 3248.59 samples/sec Loss 0.8544 LearningRate 0.0007 Epoch: 18 Global Step: 227170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:26:59,657-Speed 3324.70 samples/sec Loss 0.8549 LearningRate 0.0007 Epoch: 18 Global Step: 227180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:02,754-Speed 3307.43 samples/sec Loss 0.8526 LearningRate 0.0007 Epoch: 18 Global Step: 227190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:05,866-Speed 3291.87 samples/sec Loss 0.8748 LearningRate 0.0007 Epoch: 18 Global Step: 227200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:08,927-Speed 3347.04 samples/sec Loss 0.8293 LearningRate 0.0007 Epoch: 18 Global Step: 227210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:12,015-Speed 3316.49 samples/sec Loss 0.8613 LearningRate 0.0007 Epoch: 18 Global Step: 227220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:15,109-Speed 3310.92 samples/sec Loss 0.8745 LearningRate 0.0007 Epoch: 18 Global Step: 227230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:18,224-Speed 3288.56 samples/sec Loss 0.8790 LearningRate 0.0007 Epoch: 18 Global Step: 227240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:21,275-Speed 3357.09 samples/sec Loss 0.8471 LearningRate 0.0007 Epoch: 18 Global Step: 227250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:24,351-Speed 3330.84 samples/sec Loss 0.8497 LearningRate 0.0007 Epoch: 18 Global Step: 227260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:27,506-Speed 3245.86 samples/sec Loss 0.8468 LearningRate 0.0007 Epoch: 18 Global Step: 227270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:30,677-Speed 3230.09 samples/sec Loss 0.8473 LearningRate 0.0007 Epoch: 18 Global Step: 227280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:33,741-Speed 3343.05 samples/sec Loss 0.8584 LearningRate 0.0007 Epoch: 18 Global Step: 227290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:36,799-Speed 3350.07 samples/sec Loss 0.8709 LearningRate 0.0007 Epoch: 18 Global Step: 227300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:39,906-Speed 3296.49 samples/sec Loss 0.8781 LearningRate 0.0007 Epoch: 18 Global Step: 227310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:43,026-Speed 3283.35 samples/sec Loss 0.8462 LearningRate 0.0007 Epoch: 18 Global Step: 227320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:46,097-Speed 3336.04 samples/sec Loss 0.8407 LearningRate 0.0007 Epoch: 18 Global Step: 227330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:49,174-Speed 3329.19 samples/sec Loss 0.8687 LearningRate 0.0007 Epoch: 18 Global Step: 227340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:27:52,221-Speed 3360.82 samples/sec Loss 0.8518 LearningRate 0.0007 Epoch: 18 Global Step: 227350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:55,296-Speed 3331.33 samples/sec Loss 0.8423 LearningRate 0.0007 Epoch: 18 Global Step: 227360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:27:58,390-Speed 3310.70 samples/sec Loss 0.8501 LearningRate 0.0007 Epoch: 18 Global Step: 227370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:28:01,510-Speed 3283.33 samples/sec Loss 0.8530 LearningRate 0.0007 Epoch: 18 Global Step: 227380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:28:04,588-Speed 3327.46 samples/sec Loss 0.8313 LearningRate 0.0007 Epoch: 18 Global Step: 227390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:28:07,666-Speed 3328.31 samples/sec Loss 0.8711 LearningRate 0.0007 Epoch: 18 Global Step: 227400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:28:10,728-Speed 3344.95 samples/sec Loss 0.8346 LearningRate 0.0007 Epoch: 18 Global Step: 227410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:28:13,802-Speed 3333.00 samples/sec Loss 0.8682 LearningRate 0.0007 Epoch: 18 Global Step: 227420 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:28:16,905-Speed 3301.21 samples/sec Loss 0.8349 LearningRate 0.0007 Epoch: 18 Global Step: 227430 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:28:19,968-Speed 3344.19 samples/sec Loss 0.8314 LearningRate 0.0007 Epoch: 18 Global Step: 227440 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:28:23,078-Speed 3293.72 samples/sec Loss 0.8651 LearningRate 0.0007 Epoch: 18 Global Step: 227450 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:28:26,199-Speed 3281.97 samples/sec Loss 0.8953 LearningRate 0.0007 Epoch: 18 Global Step: 227460 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:28:29,373-Speed 3226.67 samples/sec Loss 0.8484 LearningRate 0.0007 Epoch: 18 Global Step: 227470 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:28:32,432-Speed 3348.78 samples/sec Loss 0.8180 LearningRate 0.0007 Epoch: 18 Global Step: 227480 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:28:35,538-Speed 3298.03 samples/sec Loss 0.8351 LearningRate 0.0007 Epoch: 18 Global Step: 227490 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:28:38,650-Speed 3291.92 samples/sec Loss 0.8401 LearningRate 0.0007 Epoch: 18 Global Step: 227500 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:28:41,708-Speed 3349.68 samples/sec Loss 0.8693 LearningRate 0.0007 Epoch: 18 Global Step: 227510 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:28:44,770-Speed 3345.63 samples/sec Loss 0.8686 LearningRate 0.0007 Epoch: 18 Global Step: 227520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:28:47,871-Speed 3303.30 samples/sec Loss 0.8175 LearningRate 0.0007 Epoch: 18 Global Step: 227530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:28:51,020-Speed 3252.43 samples/sec Loss 0.8362 LearningRate 0.0007 Epoch: 18 Global Step: 227540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:28:54,128-Speed 3295.96 samples/sec Loss 0.8459 LearningRate 0.0007 Epoch: 18 Global Step: 227550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:28:57,234-Speed 3297.55 samples/sec Loss 0.8357 LearningRate 0.0007 Epoch: 18 Global Step: 227560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:29:00,346-Speed 3292.25 samples/sec Loss 0.8727 LearningRate 0.0007 Epoch: 18 Global Step: 227570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:29:03,403-Speed 3350.09 samples/sec Loss 0.7976 LearningRate 0.0007 Epoch: 18 Global Step: 227580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:29:06,541-Speed 3264.23 samples/sec Loss 0.8457 LearningRate 0.0007 Epoch: 18 Global Step: 227590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:29:09,603-Speed 3346.10 samples/sec Loss 0.8131 LearningRate 0.0007 Epoch: 18 Global Step: 227600 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:29:12,724-Speed 3282.19 samples/sec Loss 0.8635 LearningRate 0.0007 Epoch: 18 Global Step: 227610 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:29:15,831-Speed 3296.96 samples/sec Loss 0.8169 LearningRate 0.0007 Epoch: 18 Global Step: 227620 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:29:18,892-Speed 3346.45 samples/sec Loss 0.8206 LearningRate 0.0007 Epoch: 18 Global Step: 227630 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:29:21,949-Speed 3349.91 samples/sec Loss 0.8465 LearningRate 0.0007 Epoch: 18 Global Step: 227640 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:29:25,746-Speed 2697.41 samples/sec Loss 0.8808 LearningRate 0.0007 Epoch: 18 Global Step: 227650 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:29:28,859-Speed 3290.76 samples/sec Loss 0.8490 LearningRate 0.0007 Epoch: 18 Global Step: 227660 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:29:31,947-Speed 3317.04 samples/sec Loss 0.8235 LearningRate 0.0007 Epoch: 18 Global Step: 227670 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:29:35,110-Speed 3239.04 samples/sec Loss 0.8239 LearningRate 0.0007 Epoch: 18 Global Step: 227680 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:29:38,225-Speed 3288.07 samples/sec Loss 0.8357 LearningRate 0.0007 Epoch: 18 Global Step: 227690 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:29:41,328-Speed 3300.85 samples/sec Loss 0.8250 LearningRate 0.0007 Epoch: 18 Global Step: 227700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:29:44,487-Speed 3242.42 samples/sec Loss 0.8407 LearningRate 0.0007 Epoch: 18 Global Step: 227710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:29:47,571-Speed 3321.70 samples/sec Loss 0.8603 LearningRate 0.0007 Epoch: 18 Global Step: 227720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:29:50,700-Speed 3273.54 samples/sec Loss 0.7889 LearningRate 0.0007 Epoch: 18 Global Step: 227730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:29:53,951-Speed 3150.70 samples/sec Loss 0.8419 LearningRate 0.0007 Epoch: 18 Global Step: 227740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:29:57,033-Speed 3324.45 samples/sec Loss 0.8396 LearningRate 0.0007 Epoch: 18 Global Step: 227750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:00,113-Speed 3325.27 samples/sec Loss 0.8077 LearningRate 0.0007 Epoch: 18 Global Step: 227760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:03,169-Speed 3351.86 samples/sec Loss 0.8649 LearningRate 0.0007 Epoch: 18 Global Step: 227770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:06,308-Speed 3263.02 samples/sec Loss 0.8437 LearningRate 0.0007 Epoch: 18 Global Step: 227780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:09,368-Speed 3347.19 samples/sec Loss 0.8419 LearningRate 0.0007 Epoch: 18 Global Step: 227790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:12,458-Speed 3315.37 samples/sec Loss 0.8376 LearningRate 0.0007 Epoch: 18 Global Step: 227800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:30:15,607-Speed 3253.37 samples/sec Loss 0.8133 LearningRate 0.0007 Epoch: 18 Global Step: 227810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:30:18,711-Speed 3299.43 samples/sec Loss 0.7974 LearningRate 0.0007 Epoch: 18 Global Step: 227820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:21,770-Speed 3348.82 samples/sec Loss 0.8577 LearningRate 0.0007 Epoch: 18 Global Step: 227830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:24,894-Speed 3278.58 samples/sec Loss 0.8634 LearningRate 0.0007 Epoch: 18 Global Step: 227840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:27,951-Speed 3350.50 samples/sec Loss 0.8523 LearningRate 0.0007 Epoch: 18 Global Step: 227850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:31,043-Speed 3313.56 samples/sec Loss 0.8344 LearningRate 0.0007 Epoch: 18 Global Step: 227860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:34,101-Speed 3349.55 samples/sec Loss 0.8426 LearningRate 0.0007 Epoch: 18 Global Step: 227870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:37,238-Speed 3264.95 samples/sec Loss 0.8356 LearningRate 0.0007 Epoch: 18 Global Step: 227880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:40,311-Speed 3332.86 samples/sec Loss 0.9002 LearningRate 0.0007 Epoch: 18 Global Step: 227890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:43,462-Speed 3250.47 samples/sec Loss 0.8365 LearningRate 0.0007 Epoch: 18 Global Step: 227900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:30:46,521-Speed 3348.93 samples/sec Loss 0.8500 LearningRate 0.0007 Epoch: 18 Global Step: 227910 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:30:49,607-Speed 3319.44 samples/sec Loss 0.8203 LearningRate 0.0007 Epoch: 18 Global Step: 227920 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:30:52,663-Speed 3351.86 samples/sec Loss 0.8168 LearningRate 0.0007 Epoch: 18 Global Step: 227930 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:30:55,734-Speed 3335.00 samples/sec Loss 0.8446 LearningRate 0.0007 Epoch: 18 Global Step: 227940 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:30:58,793-Speed 3348.69 samples/sec Loss 0.8569 LearningRate 0.0007 Epoch: 18 Global Step: 227950 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:01,870-Speed 3329.21 samples/sec Loss 0.8411 LearningRate 0.0007 Epoch: 18 Global Step: 227960 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:04,959-Speed 3316.30 samples/sec Loss 0.8572 LearningRate 0.0007 Epoch: 18 Global Step: 227970 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:08,065-Speed 3297.09 samples/sec Loss 0.8329 LearningRate 0.0007 Epoch: 18 Global Step: 227980 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:11,131-Speed 3341.02 samples/sec Loss 0.8761 LearningRate 0.0007 Epoch: 18 Global Step: 227990 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:14,248-Speed 3286.61 samples/sec Loss 0.8209 LearningRate 0.0007 Epoch: 18 Global Step: 228000 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:17,303-Speed 3353.29 samples/sec Loss 0.8725 LearningRate 0.0007 Epoch: 18 Global Step: 228010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:31:20,364-Speed 3355.21 samples/sec Loss 0.8290 LearningRate 0.0007 Epoch: 18 Global Step: 228020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:31:23,465-Speed 3302.14 samples/sec Loss 0.8383 LearningRate 0.0007 Epoch: 18 Global Step: 228030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:31:26,499-Speed 3376.32 samples/sec Loss 0.8223 LearningRate 0.0007 Epoch: 18 Global Step: 228040 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:29,670-Speed 3230.31 samples/sec Loss 0.8508 LearningRate 0.0007 Epoch: 18 Global Step: 228050 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:32,755-Speed 3321.29 samples/sec Loss 0.8449 LearningRate 0.0007 Epoch: 18 Global Step: 228060 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:35,897-Speed 3260.01 samples/sec Loss 0.8615 LearningRate 0.0007 Epoch: 18 Global Step: 228070 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:38,955-Speed 3349.50 samples/sec Loss 0.8402 LearningRate 0.0007 Epoch: 18 Global Step: 228080 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:42,009-Speed 3353.43 samples/sec Loss 0.8507 LearningRate 0.0007 Epoch: 18 Global Step: 228090 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:45,100-Speed 3314.62 samples/sec Loss 0.8327 LearningRate 0.0007 Epoch: 18 Global Step: 228100 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:48,238-Speed 3264.24 samples/sec Loss 0.8424 LearningRate 0.0007 Epoch: 18 Global Step: 228110 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:51,296-Speed 3349.85 samples/sec Loss 0.8364 LearningRate 0.0007 Epoch: 18 Global Step: 228120 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:54,402-Speed 3297.51 samples/sec Loss 0.8839 LearningRate 0.0007 Epoch: 18 Global Step: 228130 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:31:57,455-Speed 3354.94 samples/sec Loss 0.8270 LearningRate 0.0007 Epoch: 18 Global Step: 228140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:00,526-Speed 3336.16 samples/sec Loss 0.8508 LearningRate 0.0007 Epoch: 18 Global Step: 228150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:03,722-Speed 3204.38 samples/sec Loss 0.7981 LearningRate 0.0007 Epoch: 18 Global Step: 228160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:06,850-Speed 3275.09 samples/sec Loss 0.8220 LearningRate 0.0007 Epoch: 18 Global Step: 228170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:09,941-Speed 3313.60 samples/sec Loss 0.8326 LearningRate 0.0007 Epoch: 18 Global Step: 228180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:13,057-Speed 3287.39 samples/sec Loss 0.8481 LearningRate 0.0007 Epoch: 18 Global Step: 228190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:16,146-Speed 3316.32 samples/sec Loss 0.8882 LearningRate 0.0007 Epoch: 18 Global Step: 228200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:19,310-Speed 3237.26 samples/sec Loss 0.8707 LearningRate 0.0007 Epoch: 18 Global Step: 228210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:22,403-Speed 3311.33 samples/sec Loss 0.8315 LearningRate 0.0007 Epoch: 18 Global Step: 228220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:25,521-Speed 3285.99 samples/sec Loss 0.8641 LearningRate 0.0007 Epoch: 18 Global Step: 228230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:28,687-Speed 3235.04 samples/sec Loss 0.8281 LearningRate 0.0007 Epoch: 18 Global Step: 228240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:32:31,806-Speed 3284.20 samples/sec Loss 0.8765 LearningRate 0.0007 Epoch: 18 Global Step: 228250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:32:34,910-Speed 3299.71 samples/sec Loss 0.8256 LearningRate 0.0007 Epoch: 18 Global Step: 228260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:32:38,033-Speed 3280.10 samples/sec Loss 0.8457 LearningRate 0.0007 Epoch: 18 Global Step: 228270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:32:41,133-Speed 3303.79 samples/sec Loss 0.8677 LearningRate 0.0007 Epoch: 18 Global Step: 228280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:44,255-Speed 3281.40 samples/sec Loss 0.8191 LearningRate 0.0007 Epoch: 18 Global Step: 228290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:47,373-Speed 3284.59 samples/sec Loss 0.8245 LearningRate 0.0007 Epoch: 18 Global Step: 228300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:50,556-Speed 3218.25 samples/sec Loss 0.8186 LearningRate 0.0007 Epoch: 18 Global Step: 228310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:53,657-Speed 3303.30 samples/sec Loss 0.8183 LearningRate 0.0007 Epoch: 18 Global Step: 228320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:56,716-Speed 3348.36 samples/sec Loss 0.8590 LearningRate 0.0007 Epoch: 18 Global Step: 228330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:32:59,795-Speed 3327.33 samples/sec Loss 0.8540 LearningRate 0.0007 Epoch: 18 Global Step: 228340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:02,920-Speed 3277.24 samples/sec Loss 0.8304 LearningRate 0.0007 Epoch: 18 Global Step: 228350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:06,073-Speed 3249.34 samples/sec Loss 0.8322 LearningRate 0.0007 Epoch: 18 Global Step: 228360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:09,206-Speed 3269.55 samples/sec Loss 0.8610 LearningRate 0.0007 Epoch: 18 Global Step: 228370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:12,277-Speed 3334.62 samples/sec Loss 0.8273 LearningRate 0.0007 Epoch: 18 Global Step: 228380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:33:15,382-Speed 3299.38 samples/sec Loss 0.8380 LearningRate 0.0007 Epoch: 18 Global Step: 228390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:18,551-Speed 3232.70 samples/sec Loss 0.8545 LearningRate 0.0006 Epoch: 18 Global Step: 228400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:21,633-Speed 3322.78 samples/sec Loss 0.8463 LearningRate 0.0006 Epoch: 18 Global Step: 228410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:24,725-Speed 3313.36 samples/sec Loss 0.8210 LearningRate 0.0006 Epoch: 18 Global Step: 228420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:27,878-Speed 3248.96 samples/sec Loss 0.8380 LearningRate 0.0006 Epoch: 18 Global Step: 228430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:30,940-Speed 3344.22 samples/sec Loss 0.8274 LearningRate 0.0006 Epoch: 18 Global Step: 228440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:34,026-Speed 3319.92 samples/sec Loss 0.8251 LearningRate 0.0006 Epoch: 18 Global Step: 228450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:37,122-Speed 3307.88 samples/sec Loss 0.8418 LearningRate 0.0006 Epoch: 18 Global Step: 228460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:40,177-Speed 3353.38 samples/sec Loss 0.8216 LearningRate 0.0006 Epoch: 18 Global Step: 228470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:43,275-Speed 3305.69 samples/sec Loss 0.8732 LearningRate 0.0006 Epoch: 18 Global Step: 228480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:46,361-Speed 3319.81 samples/sec Loss 0.8944 LearningRate 0.0006 Epoch: 18 Global Step: 228490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:33:49,466-Speed 3298.61 samples/sec Loss 0.8312 LearningRate 0.0006 Epoch: 18 Global Step: 228500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:52,556-Speed 3314.87 samples/sec Loss 0.8732 LearningRate 0.0006 Epoch: 18 Global Step: 228510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:55,714-Speed 3244.42 samples/sec Loss 0.8258 LearningRate 0.0006 Epoch: 18 Global Step: 228520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:33:58,791-Speed 3328.54 samples/sec Loss 0.7978 LearningRate 0.0006 Epoch: 18 Global Step: 228530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:34:01,938-Speed 3255.33 samples/sec Loss 0.8529 LearningRate 0.0006 Epoch: 18 Global Step: 228540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:34:05,041-Speed 3300.95 samples/sec Loss 0.8488 LearningRate 0.0006 Epoch: 18 Global Step: 228550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:34:08,095-Speed 3354.01 samples/sec Loss 0.8618 LearningRate 0.0006 Epoch: 18 Global Step: 228560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:34:11,152-Speed 3350.35 samples/sec Loss 0.8443 LearningRate 0.0006 Epoch: 18 Global Step: 228570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:34:14,261-Speed 3295.06 samples/sec Loss 0.8374 LearningRate 0.0006 Epoch: 18 Global Step: 228580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:34:17,408-Speed 3254.65 samples/sec Loss 0.8831 LearningRate 0.0006 Epoch: 18 Global Step: 228590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:34:20,490-Speed 3323.62 samples/sec Loss 0.8304 LearningRate 0.0006 Epoch: 18 Global Step: 228600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:34:23,595-Speed 3299.62 samples/sec Loss 0.8164 LearningRate 0.0006 Epoch: 18 Global Step: 228610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:34:26,736-Speed 3261.06 samples/sec Loss 0.8643 LearningRate 0.0006 Epoch: 18 Global Step: 228620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:34:29,892-Speed 3245.75 samples/sec Loss 0.8522 LearningRate 0.0006 Epoch: 18 Global Step: 228630 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:34:32,957-Speed 3340.84 samples/sec Loss 0.8657 LearningRate 0.0006 Epoch: 18 Global Step: 228640 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:34:36,062-Speed 3299.80 samples/sec Loss 0.8595 LearningRate 0.0006 Epoch: 18 Global Step: 228650 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:34:39,197-Speed 3267.53 samples/sec Loss 0.8553 LearningRate 0.0006 Epoch: 18 Global Step: 228660 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:34:42,433-Speed 3165.11 samples/sec Loss 0.8775 LearningRate 0.0006 Epoch: 18 Global Step: 228670 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:34:45,498-Speed 3342.06 samples/sec Loss 0.8469 LearningRate 0.0006 Epoch: 18 Global Step: 228680 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:34:48,617-Speed 3283.63 samples/sec Loss 0.8505 LearningRate 0.0006 Epoch: 18 Global Step: 228690 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:34:51,736-Speed 3284.25 samples/sec Loss 0.8962 LearningRate 0.0006 Epoch: 18 Global Step: 228700 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:34:54,823-Speed 3318.28 samples/sec Loss 0.8105 LearningRate 0.0006 Epoch: 18 Global Step: 228710 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:34:57,868-Speed 3364.09 samples/sec Loss 0.8450 LearningRate 0.0006 Epoch: 18 Global Step: 228720 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:35:01,083-Speed 3186.38 samples/sec Loss 0.8391 LearningRate 0.0006 Epoch: 18 Global Step: 228730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:35:04,193-Speed 3293.82 samples/sec Loss 0.8356 LearningRate 0.0006 Epoch: 18 Global Step: 228740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:35:07,297-Speed 3299.83 samples/sec Loss 0.8455 LearningRate 0.0006 Epoch: 18 Global Step: 228750 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:35:10,393-Speed 3308.14 samples/sec Loss 0.8593 LearningRate 0.0006 Epoch: 18 Global Step: 228760 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:35:13,600-Speed 3194.54 samples/sec Loss 0.8129 LearningRate 0.0006 Epoch: 18 Global Step: 228770 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:35:16,689-Speed 3315.83 samples/sec Loss 0.8472 LearningRate 0.0006 Epoch: 18 Global Step: 228780 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:35:19,767-Speed 3328.01 samples/sec Loss 0.8174 LearningRate 0.0006 Epoch: 18 Global Step: 228790 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:35:22,918-Speed 3250.21 samples/sec Loss 0.8380 LearningRate 0.0006 Epoch: 18 Global Step: 228800 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:35:26,052-Speed 3268.43 samples/sec Loss 0.8682 LearningRate 0.0006 Epoch: 18 Global Step: 228810 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:35:29,158-Speed 3298.38 samples/sec Loss 0.8898 LearningRate 0.0006 Epoch: 18 Global Step: 228820 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:35:32,275-Speed 3286.62 samples/sec Loss 0.8663 LearningRate 0.0006 Epoch: 18 Global Step: 228830 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:35:35,372-Speed 3307.61 samples/sec Loss 0.8793 LearningRate 0.0006 Epoch: 18 Global Step: 228840 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:35:38,527-Speed 3246.66 samples/sec Loss 0.8257 LearningRate 0.0006 Epoch: 18 Global Step: 228850 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:35:41,632-Speed 3298.51 samples/sec Loss 0.8656 LearningRate 0.0006 Epoch: 18 Global Step: 228860 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:35:44,718-Speed 3318.63 samples/sec Loss 0.8414 LearningRate 0.0006 Epoch: 18 Global Step: 228870 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:35:47,836-Speed 3286.07 samples/sec Loss 0.8407 LearningRate 0.0006 Epoch: 18 Global Step: 228880 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:35:51,023-Speed 3213.91 samples/sec Loss 0.8570 LearningRate 0.0006 Epoch: 18 Global Step: 228890 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:35:54,138-Speed 3288.34 samples/sec Loss 0.8109 LearningRate 0.0006 Epoch: 18 Global Step: 228900 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:35:57,206-Speed 3338.80 samples/sec Loss 0.8877 LearningRate 0.0006 Epoch: 18 Global Step: 228910 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:00,272-Speed 3341.00 samples/sec Loss 0.8083 LearningRate 0.0006 Epoch: 18 Global Step: 228920 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:03,429-Speed 3244.08 samples/sec Loss 0.8503 LearningRate 0.0006 Epoch: 18 Global Step: 228930 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:06,579-Speed 3252.35 samples/sec Loss 0.8509 LearningRate 0.0006 Epoch: 18 Global Step: 228940 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:09,652-Speed 3333.13 samples/sec Loss 0.8561 LearningRate 0.0006 Epoch: 18 Global Step: 228950 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:12,744-Speed 3313.02 samples/sec Loss 0.8902 LearningRate 0.0006 Epoch: 18 Global Step: 228960 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:15,905-Speed 3240.36 samples/sec Loss 0.8453 LearningRate 0.0006 Epoch: 18 Global Step: 228970 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:19,053-Speed 3253.94 samples/sec Loss 0.8302 LearningRate 0.0006 Epoch: 18 Global Step: 228980 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:22,132-Speed 3327.08 samples/sec Loss 0.8347 LearningRate 0.0006 Epoch: 18 Global Step: 228990 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:25,209-Speed 3327.79 samples/sec Loss 0.8505 LearningRate 0.0006 Epoch: 18 Global Step: 229000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:36:28,399-Speed 3211.38 samples/sec Loss 0.8287 LearningRate 0.0006 Epoch: 18 Global Step: 229010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:36:31,552-Speed 3249.13 samples/sec Loss 0.8827 LearningRate 0.0006 Epoch: 18 Global Step: 229020 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:34,662-Speed 3293.00 samples/sec Loss 0.8240 LearningRate 0.0006 Epoch: 18 Global Step: 229030 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:37,808-Speed 3256.55 samples/sec Loss 0.8553 LearningRate 0.0006 Epoch: 18 Global Step: 229040 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:40,919-Speed 3292.26 samples/sec Loss 0.8814 LearningRate 0.0006 Epoch: 18 Global Step: 229050 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:44,040-Speed 3282.64 samples/sec Loss 0.8431 LearningRate 0.0006 Epoch: 18 Global Step: 229060 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:47,096-Speed 3351.05 samples/sec Loss 0.8484 LearningRate 0.0006 Epoch: 18 Global Step: 229070 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:50,168-Speed 3334.80 samples/sec Loss 0.8487 LearningRate 0.0006 Epoch: 18 Global Step: 229080 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:53,307-Speed 3262.60 samples/sec Loss 0.8725 LearningRate 0.0006 Epoch: 18 Global Step: 229090 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:56,368-Speed 3346.46 samples/sec Loss 0.8679 LearningRate 0.0006 Epoch: 18 Global Step: 229100 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:36:59,464-Speed 3309.44 samples/sec Loss 0.8450 LearningRate 0.0006 Epoch: 18 Global Step: 229110 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:37:02,601-Speed 3265.31 samples/sec Loss 0.8054 LearningRate 0.0006 Epoch: 18 Global Step: 229120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:05,743-Speed 3259.59 samples/sec Loss 0.8511 LearningRate 0.0006 Epoch: 18 Global Step: 229130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:08,815-Speed 3335.13 samples/sec Loss 0.8394 LearningRate 0.0006 Epoch: 18 Global Step: 229140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:11,962-Speed 3254.32 samples/sec Loss 0.8382 LearningRate 0.0006 Epoch: 18 Global Step: 229150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:15,111-Speed 3252.78 samples/sec Loss 0.8533 LearningRate 0.0006 Epoch: 18 Global Step: 229160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:18,240-Speed 3273.99 samples/sec Loss 0.8366 LearningRate 0.0006 Epoch: 18 Global Step: 229170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:21,326-Speed 3319.26 samples/sec Loss 0.8109 LearningRate 0.0006 Epoch: 18 Global Step: 229180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:24,504-Speed 3223.35 samples/sec Loss 0.8356 LearningRate 0.0006 Epoch: 18 Global Step: 229190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:27,661-Speed 3243.95 samples/sec Loss 0.8147 LearningRate 0.0006 Epoch: 18 Global Step: 229200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:30,799-Speed 3264.84 samples/sec Loss 0.8351 LearningRate 0.0006 Epoch: 18 Global Step: 229210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:33,862-Speed 3343.94 samples/sec Loss 0.8285 LearningRate 0.0006 Epoch: 18 Global Step: 229220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:37:36,958-Speed 3308.70 samples/sec Loss 0.8413 LearningRate 0.0006 Epoch: 18 Global Step: 229230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:37:40,046-Speed 3317.65 samples/sec Loss 0.8170 LearningRate 0.0006 Epoch: 18 Global Step: 229240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:37:43,183-Speed 3264.83 samples/sec Loss 0.8429 LearningRate 0.0006 Epoch: 18 Global Step: 229250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:37:46,303-Speed 3283.57 samples/sec Loss 0.8363 LearningRate 0.0006 Epoch: 18 Global Step: 229260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:49,396-Speed 3310.99 samples/sec Loss 0.8296 LearningRate 0.0006 Epoch: 18 Global Step: 229270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:37:52,505-Speed 3295.11 samples/sec Loss 0.8576 LearningRate 0.0006 Epoch: 18 Global Step: 229280 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:37:55,613-Speed 3295.65 samples/sec Loss 0.8230 LearningRate 0.0006 Epoch: 18 Global Step: 229290 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:37:58,708-Speed 3309.71 samples/sec Loss 0.8541 LearningRate 0.0006 Epoch: 18 Global Step: 229300 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:01,790-Speed 3324.25 samples/sec Loss 0.8494 LearningRate 0.0006 Epoch: 18 Global Step: 229310 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:04,881-Speed 3314.06 samples/sec Loss 0.8351 LearningRate 0.0006 Epoch: 18 Global Step: 229320 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:07,978-Speed 3307.73 samples/sec Loss 0.8466 LearningRate 0.0006 Epoch: 18 Global Step: 229330 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:11,094-Speed 3286.75 samples/sec Loss 0.8184 LearningRate 0.0006 Epoch: 18 Global Step: 229340 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:14,175-Speed 3325.17 samples/sec Loss 0.8244 LearningRate 0.0006 Epoch: 18 Global Step: 229350 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:17,296-Speed 3281.54 samples/sec Loss 0.8475 LearningRate 0.0006 Epoch: 18 Global Step: 229360 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:20,423-Speed 3276.40 samples/sec Loss 0.8474 LearningRate 0.0006 Epoch: 18 Global Step: 229370 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:23,601-Speed 3222.64 samples/sec Loss 0.8814 LearningRate 0.0006 Epoch: 18 Global Step: 229380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:38:26,714-Speed 3291.03 samples/sec Loss 0.8286 LearningRate 0.0006 Epoch: 18 Global Step: 229390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:38:29,850-Speed 3266.22 samples/sec Loss 0.8831 LearningRate 0.0006 Epoch: 18 Global Step: 229400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:38:32,928-Speed 3327.39 samples/sec Loss 0.8362 LearningRate 0.0006 Epoch: 18 Global Step: 229410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:38:36,042-Speed 3289.66 samples/sec Loss 0.8383 LearningRate 0.0006 Epoch: 18 Global Step: 229420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:38:39,117-Speed 3330.86 samples/sec Loss 0.8582 LearningRate 0.0006 Epoch: 18 Global Step: 229430 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:42,256-Speed 3262.93 samples/sec Loss 0.8623 LearningRate 0.0006 Epoch: 18 Global Step: 229440 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:45,352-Speed 3308.58 samples/sec Loss 0.8256 LearningRate 0.0006 Epoch: 18 Global Step: 229450 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:48,490-Speed 3264.96 samples/sec Loss 0.8100 LearningRate 0.0006 Epoch: 18 Global Step: 229460 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:51,572-Speed 3323.81 samples/sec Loss 0.8672 LearningRate 0.0006 Epoch: 18 Global Step: 229470 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:54,660-Speed 3316.64 samples/sec Loss 0.8549 LearningRate 0.0006 Epoch: 18 Global Step: 229480 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:38:57,722-Speed 3344.68 samples/sec Loss 0.8348 LearningRate 0.0006 Epoch: 18 Global Step: 229490 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:39:00,872-Speed 3251.84 samples/sec Loss 0.8199 LearningRate 0.0006 Epoch: 18 Global Step: 229500 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:39:03,992-Speed 3283.79 samples/sec Loss 0.8577 LearningRate 0.0006 Epoch: 18 Global Step: 229510 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:39:07,134-Speed 3260.00 samples/sec Loss 0.8440 LearningRate 0.0006 Epoch: 18 Global Step: 229520 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:39:10,183-Speed 3358.79 samples/sec Loss 0.8569 LearningRate 0.0006 Epoch: 18 Global Step: 229530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:13,262-Speed 3326.95 samples/sec Loss 0.8598 LearningRate 0.0006 Epoch: 18 Global Step: 229540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:16,328-Speed 3341.34 samples/sec Loss 0.8400 LearningRate 0.0006 Epoch: 18 Global Step: 229550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:19,458-Speed 3272.38 samples/sec Loss 0.8074 LearningRate 0.0006 Epoch: 18 Global Step: 229560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:22,535-Speed 3328.58 samples/sec Loss 0.8663 LearningRate 0.0006 Epoch: 18 Global Step: 229570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:25,605-Speed 3336.97 samples/sec Loss 0.8492 LearningRate 0.0006 Epoch: 18 Global Step: 229580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:28,743-Speed 3263.68 samples/sec Loss 0.8333 LearningRate 0.0006 Epoch: 18 Global Step: 229590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:31,865-Speed 3281.43 samples/sec Loss 0.8105 LearningRate 0.0006 Epoch: 18 Global Step: 229600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:34,928-Speed 3343.64 samples/sec Loss 0.8190 LearningRate 0.0006 Epoch: 18 Global Step: 229610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:38,044-Speed 3287.66 samples/sec Loss 0.8125 LearningRate 0.0006 Epoch: 18 Global Step: 229620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:41,153-Speed 3294.80 samples/sec Loss 0.8125 LearningRate 0.0006 Epoch: 18 Global Step: 229630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:39:44,246-Speed 3312.34 samples/sec Loss 0.8192 LearningRate 0.0006 Epoch: 18 Global Step: 229640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:39:47,367-Speed 3281.52 samples/sec Loss 0.8593 LearningRate 0.0006 Epoch: 18 Global Step: 229650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:39:50,481-Speed 3290.01 samples/sec Loss 0.8315 LearningRate 0.0006 Epoch: 18 Global Step: 229660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:39:53,557-Speed 3329.37 samples/sec Loss 0.8456 LearningRate 0.0006 Epoch: 18 Global Step: 229670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:56,608-Speed 3358.63 samples/sec Loss 0.7931 LearningRate 0.0006 Epoch: 18 Global Step: 229680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:39:59,686-Speed 3327.26 samples/sec Loss 0.8240 LearningRate 0.0006 Epoch: 18 Global Step: 229690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:40:02,758-Speed 3334.49 samples/sec Loss 0.8051 LearningRate 0.0006 Epoch: 18 Global Step: 229700 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:05,914-Speed 3245.25 samples/sec Loss 0.8263 LearningRate 0.0006 Epoch: 18 Global Step: 229710 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:08,995-Speed 3325.57 samples/sec Loss 0.8379 LearningRate 0.0006 Epoch: 18 Global Step: 229720 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:12,058-Speed 3344.18 samples/sec Loss 0.8585 LearningRate 0.0006 Epoch: 18 Global Step: 229730 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:15,201-Speed 3259.22 samples/sec Loss 0.8306 LearningRate 0.0006 Epoch: 18 Global Step: 229740 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:18,313-Speed 3291.73 samples/sec Loss 0.8565 LearningRate 0.0006 Epoch: 18 Global Step: 229750 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:21,417-Speed 3299.26 samples/sec Loss 0.8329 LearningRate 0.0006 Epoch: 18 Global Step: 229760 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:24,498-Speed 3324.27 samples/sec Loss 0.8398 LearningRate 0.0006 Epoch: 18 Global Step: 229770 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:27,645-Speed 3255.49 samples/sec Loss 0.8550 LearningRate 0.0006 Epoch: 18 Global Step: 229780 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:30,809-Speed 3237.78 samples/sec Loss 0.8732 LearningRate 0.0006 Epoch: 18 Global Step: 229790 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:33,899-Speed 3314.74 samples/sec Loss 0.7926 LearningRate 0.0006 Epoch: 18 Global Step: 229800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:40:36,988-Speed 3315.98 samples/sec Loss 0.8621 LearningRate 0.0006 Epoch: 18 Global Step: 229810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:40:40,096-Speed 3295.25 samples/sec Loss 0.8241 LearningRate 0.0006 Epoch: 18 Global Step: 229820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:40:43,193-Speed 3307.63 samples/sec Loss 0.8424 LearningRate 0.0006 Epoch: 18 Global Step: 229830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:40:46,248-Speed 3353.36 samples/sec Loss 0.8583 LearningRate 0.0006 Epoch: 18 Global Step: 229840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:40:49,344-Speed 3308.40 samples/sec Loss 0.8586 LearningRate 0.0006 Epoch: 18 Global Step: 229850 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:52,453-Speed 3294.91 samples/sec Loss 0.8342 LearningRate 0.0006 Epoch: 18 Global Step: 229860 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:55,570-Speed 3286.15 samples/sec Loss 0.8538 LearningRate 0.0006 Epoch: 18 Global Step: 229870 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:40:58,647-Speed 3329.50 samples/sec Loss 0.8798 LearningRate 0.0006 Epoch: 18 Global Step: 229880 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:41:01,743-Speed 3307.54 samples/sec Loss 0.8689 LearningRate 0.0006 Epoch: 18 Global Step: 229890 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:41:04,889-Speed 3256.43 samples/sec Loss 0.8531 LearningRate 0.0006 Epoch: 18 Global Step: 229900 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:41:08,048-Speed 3242.59 samples/sec Loss 0.8658 LearningRate 0.0006 Epoch: 18 Global Step: 229910 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:41:11,139-Speed 3313.75 samples/sec Loss 0.8274 LearningRate 0.0006 Epoch: 18 Global Step: 229920 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:41:14,251-Speed 3291.94 samples/sec Loss 0.7980 LearningRate 0.0006 Epoch: 18 Global Step: 229930 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:41:17,325-Speed 3332.15 samples/sec Loss 0.8106 LearningRate 0.0006 Epoch: 18 Global Step: 229940 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:41:20,409-Speed 3321.82 samples/sec Loss 0.8455 LearningRate 0.0006 Epoch: 18 Global Step: 229950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:23,530-Speed 3282.09 samples/sec Loss 0.8292 LearningRate 0.0006 Epoch: 18 Global Step: 229960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:26,653-Speed 3279.92 samples/sec Loss 0.8131 LearningRate 0.0006 Epoch: 18 Global Step: 229970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:29,744-Speed 3313.28 samples/sec Loss 0.8439 LearningRate 0.0006 Epoch: 18 Global Step: 229980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:32,815-Speed 3336.34 samples/sec Loss 0.8306 LearningRate 0.0006 Epoch: 18 Global Step: 229990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:35,898-Speed 3322.15 samples/sec Loss 0.8144 LearningRate 0.0005 Epoch: 18 Global Step: 230000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:39,050-Speed 3251.26 samples/sec Loss 0.8401 LearningRate 0.0005 Epoch: 18 Global Step: 230010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:42,138-Speed 3317.43 samples/sec Loss 0.8621 LearningRate 0.0005 Epoch: 18 Global Step: 230020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:45,253-Speed 3288.51 samples/sec Loss 0.8531 LearningRate 0.0005 Epoch: 18 Global Step: 230030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:48,397-Speed 3258.83 samples/sec Loss 0.8136 LearningRate 0.0005 Epoch: 18 Global Step: 230040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:51,482-Speed 3319.81 samples/sec Loss 0.8452 LearningRate 0.0005 Epoch: 18 Global Step: 230050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:54,653-Speed 3230.51 samples/sec Loss 0.8671 LearningRate 0.0005 Epoch: 18 Global Step: 230060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:41:57,753-Speed 3304.55 samples/sec Loss 0.8146 LearningRate 0.0005 Epoch: 18 Global Step: 230070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:00,852-Speed 3305.07 samples/sec Loss 0.8636 LearningRate 0.0005 Epoch: 18 Global Step: 230080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:04,042-Speed 3210.48 samples/sec Loss 0.8146 LearningRate 0.0005 Epoch: 18 Global Step: 230090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:07,213-Speed 3230.53 samples/sec Loss 0.8383 LearningRate 0.0005 Epoch: 18 Global Step: 230100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:10,300-Speed 3318.08 samples/sec Loss 0.8907 LearningRate 0.0005 Epoch: 18 Global Step: 230110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:13,393-Speed 3312.44 samples/sec Loss 0.8137 LearningRate 0.0005 Epoch: 18 Global Step: 230120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:16,512-Speed 3283.95 samples/sec Loss 0.8634 LearningRate 0.0005 Epoch: 18 Global Step: 230130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:19,630-Speed 3284.76 samples/sec Loss 0.8564 LearningRate 0.0005 Epoch: 18 Global Step: 230140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:22,685-Speed 3352.58 samples/sec Loss 0.8357 LearningRate 0.0005 Epoch: 18 Global Step: 230150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:25,794-Speed 3295.12 samples/sec Loss 0.8306 LearningRate 0.0005 Epoch: 18 Global Step: 230160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:28,903-Speed 3294.99 samples/sec Loss 0.8301 LearningRate 0.0005 Epoch: 18 Global Step: 230170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:31,994-Speed 3313.56 samples/sec Loss 0.8448 LearningRate 0.0005 Epoch: 18 Global Step: 230180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:35,109-Speed 3287.56 samples/sec Loss 0.7795 LearningRate 0.0005 Epoch: 18 Global Step: 230190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:42:38,146-Speed 3374.10 samples/sec Loss 0.8529 LearningRate 0.0005 Epoch: 18 Global Step: 230200 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:42:41,257-Speed 3292.38 samples/sec Loss 0.7920 LearningRate 0.0005 Epoch: 18 Global Step: 230210 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:42:44,373-Speed 3286.78 samples/sec Loss 0.8385 LearningRate 0.0005 Epoch: 18 Global Step: 230220 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:42:47,528-Speed 3246.90 samples/sec Loss 0.8400 LearningRate 0.0005 Epoch: 18 Global Step: 230230 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:42:50,690-Speed 3239.73 samples/sec Loss 0.8264 LearningRate 0.0005 Epoch: 18 Global Step: 230240 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:42:53,832-Speed 3259.38 samples/sec Loss 0.8501 LearningRate 0.0005 Epoch: 18 Global Step: 230250 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:42:56,888-Speed 3351.74 samples/sec Loss 0.8412 LearningRate 0.0005 Epoch: 18 Global Step: 230260 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:00,024-Speed 3267.09 samples/sec Loss 0.8325 LearningRate 0.0005 Epoch: 18 Global Step: 230270 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:03,109-Speed 3320.45 samples/sec Loss 0.8530 LearningRate 0.0005 Epoch: 18 Global Step: 230280 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:06,186-Speed 3328.30 samples/sec Loss 0.8207 LearningRate 0.0005 Epoch: 18 Global Step: 230290 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:09,236-Speed 3358.23 samples/sec Loss 0.8309 LearningRate 0.0005 Epoch: 18 Global Step: 230300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:43:12,306-Speed 3337.42 samples/sec Loss 0.8269 LearningRate 0.0005 Epoch: 18 Global Step: 230310 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:15,380-Speed 3331.74 samples/sec Loss 0.8505 LearningRate 0.0005 Epoch: 18 Global Step: 230320 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:18,438-Speed 3350.05 samples/sec Loss 0.8072 LearningRate 0.0005 Epoch: 18 Global Step: 230330 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:21,523-Speed 3320.28 samples/sec Loss 0.8462 LearningRate 0.0005 Epoch: 18 Global Step: 230340 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:24,654-Speed 3271.63 samples/sec Loss 0.8244 LearningRate 0.0005 Epoch: 18 Global Step: 230350 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:27,727-Speed 3333.55 samples/sec Loss 0.7989 LearningRate 0.0005 Epoch: 18 Global Step: 230360 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:30,804-Speed 3327.95 samples/sec Loss 0.8277 LearningRate 0.0005 Epoch: 18 Global Step: 230370 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:33,864-Speed 3347.51 samples/sec Loss 0.8675 LearningRate 0.0005 Epoch: 18 Global Step: 230380 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:43:36,905-Speed 3369.47 samples/sec Loss 0.8399 LearningRate 0.0005 Epoch: 18 Global Step: 230390 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:43:40,039-Speed 3268.56 samples/sec Loss 0.8416 LearningRate 0.0005 Epoch: 18 Global Step: 230400 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:43:43,133-Speed 3311.39 samples/sec Loss 0.8345 LearningRate 0.0005 Epoch: 18 Global Step: 230410 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:43:46,194-Speed 3346.01 samples/sec Loss 0.8380 LearningRate 0.0005 Epoch: 18 Global Step: 230420 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:43:49,304-Speed 3294.08 samples/sec Loss 0.8300 LearningRate 0.0005 Epoch: 18 Global Step: 230430 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:43:52,350-Speed 3363.46 samples/sec Loss 0.8346 LearningRate 0.0005 Epoch: 18 Global Step: 230440 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:43:55,420-Speed 3336.17 samples/sec Loss 0.8214 LearningRate 0.0005 Epoch: 18 Global Step: 230450 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:43:58,473-Speed 3355.37 samples/sec Loss 0.8646 LearningRate 0.0005 Epoch: 18 Global Step: 230460 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:44:01,596-Speed 3279.59 samples/sec Loss 0.8495 LearningRate 0.0005 Epoch: 18 Global Step: 230470 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:44:04,777-Speed 3220.97 samples/sec Loss 0.8118 LearningRate 0.0005 Epoch: 18 Global Step: 230480 Fp16 Grad Scale: 4096 Required: 2 hours Training: 2022-04-27 21:44:07,848-Speed 3335.07 samples/sec Loss 0.8413 LearningRate 0.0005 Epoch: 18 Global Step: 230490 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:10,924-Speed 3329.89 samples/sec Loss 0.8257 LearningRate 0.0005 Epoch: 18 Global Step: 230500 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:14,001-Speed 3329.32 samples/sec Loss 0.8783 LearningRate 0.0005 Epoch: 18 Global Step: 230510 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:17,095-Speed 3310.24 samples/sec Loss 0.8337 LearningRate 0.0005 Epoch: 18 Global Step: 230520 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:20,150-Speed 3353.09 samples/sec Loss 0.8437 LearningRate 0.0005 Epoch: 18 Global Step: 230530 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:23,223-Speed 3333.39 samples/sec Loss 0.8446 LearningRate 0.0005 Epoch: 18 Global Step: 230540 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:26,317-Speed 3310.98 samples/sec Loss 0.8172 LearningRate 0.0005 Epoch: 18 Global Step: 230550 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:29,394-Speed 3328.32 samples/sec Loss 0.8077 LearningRate 0.0005 Epoch: 18 Global Step: 230560 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:32,492-Speed 3306.35 samples/sec Loss 0.7915 LearningRate 0.0005 Epoch: 18 Global Step: 230570 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:35,580-Speed 3317.24 samples/sec Loss 0.8192 LearningRate 0.0005 Epoch: 18 Global Step: 230580 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:38,663-Speed 3323.22 samples/sec Loss 0.8529 LearningRate 0.0005 Epoch: 18 Global Step: 230590 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:41,795-Speed 3269.51 samples/sec Loss 0.8575 LearningRate 0.0005 Epoch: 18 Global Step: 230600 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:44,843-Speed 3361.52 samples/sec Loss 0.8288 LearningRate 0.0005 Epoch: 18 Global Step: 230610 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:48,004-Speed 3240.41 samples/sec Loss 0.8713 LearningRate 0.0005 Epoch: 18 Global Step: 230620 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:51,107-Speed 3301.11 samples/sec Loss 0.8089 LearningRate 0.0005 Epoch: 18 Global Step: 230630 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:54,247-Speed 3262.40 samples/sec Loss 0.8583 LearningRate 0.0005 Epoch: 18 Global Step: 230640 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:44:57,301-Speed 3353.75 samples/sec Loss 0.7999 LearningRate 0.0005 Epoch: 18 Global Step: 230650 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:45:00,372-Speed 3335.90 samples/sec Loss 0.8124 LearningRate 0.0005 Epoch: 18 Global Step: 230660 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:45:03,475-Speed 3300.51 samples/sec Loss 0.7913 LearningRate 0.0005 Epoch: 18 Global Step: 230670 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:45:06,612-Speed 3265.46 samples/sec Loss 0.8435 LearningRate 0.0005 Epoch: 18 Global Step: 230680 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 21:45:09,704-Speed 3313.33 samples/sec Loss 0.8048 LearningRate 0.0005 Epoch: 18 Global Step: 230690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:12,843-Speed 3262.98 samples/sec Loss 0.8392 LearningRate 0.0005 Epoch: 18 Global Step: 230700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:15,966-Speed 3280.04 samples/sec Loss 0.8334 LearningRate 0.0005 Epoch: 18 Global Step: 230710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:19,074-Speed 3295.01 samples/sec Loss 0.8587 LearningRate 0.0005 Epoch: 18 Global Step: 230720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:22,197-Speed 3280.89 samples/sec Loss 0.8272 LearningRate 0.0005 Epoch: 18 Global Step: 230730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:25,292-Speed 3309.61 samples/sec Loss 0.8458 LearningRate 0.0005 Epoch: 18 Global Step: 230740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:28,490-Speed 3202.76 samples/sec Loss 0.8744 LearningRate 0.0005 Epoch: 18 Global Step: 230750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:31,574-Speed 3320.44 samples/sec Loss 0.8318 LearningRate 0.0005 Epoch: 18 Global Step: 230760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:34,639-Speed 3342.73 samples/sec Loss 0.8314 LearningRate 0.0005 Epoch: 18 Global Step: 230770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:37,701-Speed 3345.72 samples/sec Loss 0.8358 LearningRate 0.0005 Epoch: 18 Global Step: 230780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:40,786-Speed 3319.63 samples/sec Loss 0.8271 LearningRate 0.0005 Epoch: 18 Global Step: 230790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:43,908-Speed 3280.75 samples/sec Loss 0.8135 LearningRate 0.0005 Epoch: 18 Global Step: 230800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:46,965-Speed 3351.64 samples/sec Loss 0.8301 LearningRate 0.0005 Epoch: 18 Global Step: 230810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:50,040-Speed 3330.34 samples/sec Loss 0.8087 LearningRate 0.0005 Epoch: 18 Global Step: 230820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:53,188-Speed 3254.00 samples/sec Loss 0.8189 LearningRate 0.0005 Epoch: 18 Global Step: 230830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:56,298-Speed 3294.08 samples/sec Loss 0.8346 LearningRate 0.0005 Epoch: 18 Global Step: 230840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:45:59,419-Speed 3281.88 samples/sec Loss 0.8471 LearningRate 0.0005 Epoch: 18 Global Step: 230850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:02,536-Speed 3285.76 samples/sec Loss 0.8695 LearningRate 0.0005 Epoch: 18 Global Step: 230860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:05,631-Speed 3310.14 samples/sec Loss 0.8755 LearningRate 0.0005 Epoch: 18 Global Step: 230870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:08,717-Speed 3318.46 samples/sec Loss 0.8259 LearningRate 0.0005 Epoch: 18 Global Step: 230880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:11,828-Speed 3292.90 samples/sec Loss 0.8310 LearningRate 0.0005 Epoch: 18 Global Step: 230890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:46:14,903-Speed 3330.93 samples/sec Loss 0.8249 LearningRate 0.0005 Epoch: 18 Global Step: 230900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:18,011-Speed 3295.61 samples/sec Loss 0.7944 LearningRate 0.0005 Epoch: 18 Global Step: 230910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:21,065-Speed 3354.53 samples/sec Loss 0.8210 LearningRate 0.0005 Epoch: 18 Global Step: 230920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:24,192-Speed 3275.67 samples/sec Loss 0.8366 LearningRate 0.0005 Epoch: 18 Global Step: 230930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:27,270-Speed 3327.76 samples/sec Loss 0.8527 LearningRate 0.0005 Epoch: 18 Global Step: 230940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:30,358-Speed 3317.02 samples/sec Loss 0.8304 LearningRate 0.0005 Epoch: 18 Global Step: 230950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:33,489-Speed 3272.03 samples/sec Loss 0.8318 LearningRate 0.0005 Epoch: 18 Global Step: 230960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:36,640-Speed 3250.16 samples/sec Loss 0.7959 LearningRate 0.0005 Epoch: 18 Global Step: 230970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:39,806-Speed 3235.74 samples/sec Loss 0.8594 LearningRate 0.0005 Epoch: 18 Global Step: 230980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:42,955-Speed 3253.14 samples/sec Loss 0.8410 LearningRate 0.0005 Epoch: 18 Global Step: 230990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:46,047-Speed 3313.07 samples/sec Loss 0.8176 LearningRate 0.0005 Epoch: 18 Global Step: 231000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:46:49,149-Speed 3301.93 samples/sec Loss 0.8253 LearningRate 0.0005 Epoch: 18 Global Step: 231010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 21:46:52,264-Speed 3287.80 samples/sec Loss 0.8773 LearningRate 0.0005 Epoch: 18 Global Step: 231020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:55,348-Speed 3321.43 samples/sec Loss 0.8020 LearningRate 0.0005 Epoch: 18 Global Step: 231030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:46:58,470-Speed 3280.76 samples/sec Loss 0.8190 LearningRate 0.0005 Epoch: 18 Global Step: 231040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:47:01,692-Speed 3179.29 samples/sec Loss 0.8569 LearningRate 0.0005 Epoch: 18 Global Step: 231050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:47:04,836-Speed 3258.50 samples/sec Loss 0.8492 LearningRate 0.0005 Epoch: 18 Global Step: 231060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:47:07,945-Speed 3294.21 samples/sec Loss 0.8153 LearningRate 0.0005 Epoch: 18 Global Step: 231070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:47:11,073-Speed 3274.99 samples/sec Loss 0.7942 LearningRate 0.0005 Epoch: 18 Global Step: 231080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:47:14,165-Speed 3312.48 samples/sec Loss 0.8446 LearningRate 0.0005 Epoch: 18 Global Step: 231090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 21:47:17,277-Speed 3292.03 samples/sec Loss 0.8402 LearningRate 0.0005 Epoch: 18 Global Step: 231100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:47:20,433-Speed 3245.40 samples/sec Loss 0.8169 LearningRate 0.0005 Epoch: 18 Global Step: 231110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:47:23,573-Speed 3261.79 samples/sec Loss 0.8467 LearningRate 0.0005 Epoch: 18 Global Step: 231120 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:47:26,677-Speed 3300.07 samples/sec Loss 0.8389 LearningRate 0.0005 Epoch: 18 Global Step: 231130 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:47:29,829-Speed 3249.53 samples/sec Loss 0.8382 LearningRate 0.0005 Epoch: 18 Global Step: 231140 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:47:32,919-Speed 3314.72 samples/sec Loss 0.8323 LearningRate 0.0005 Epoch: 18 Global Step: 231150 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:47:36,004-Speed 3320.66 samples/sec Loss 0.8405 LearningRate 0.0005 Epoch: 18 Global Step: 231160 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:47:39,167-Speed 3239.20 samples/sec Loss 0.8107 LearningRate 0.0005 Epoch: 18 Global Step: 231170 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:47:42,301-Speed 3267.54 samples/sec Loss 0.8461 LearningRate 0.0005 Epoch: 18 Global Step: 231180 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:47:45,379-Speed 3327.84 samples/sec Loss 0.8422 LearningRate 0.0005 Epoch: 18 Global Step: 231190 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:47:48,577-Speed 3203.60 samples/sec Loss 0.8512 LearningRate 0.0005 Epoch: 18 Global Step: 231200 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:47:51,768-Speed 3209.92 samples/sec Loss 0.8627 LearningRate 0.0005 Epoch: 18 Global Step: 231210 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:47:54,878-Speed 3293.30 samples/sec Loss 0.8583 LearningRate 0.0005 Epoch: 18 Global Step: 231220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:47:57,955-Speed 3329.22 samples/sec Loss 0.8410 LearningRate 0.0005 Epoch: 18 Global Step: 231230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:48:01,004-Speed 3359.41 samples/sec Loss 0.8126 LearningRate 0.0005 Epoch: 18 Global Step: 231240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:48:04,114-Speed 3294.06 samples/sec Loss 0.8119 LearningRate 0.0005 Epoch: 18 Global Step: 231250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:48:07,244-Speed 3272.36 samples/sec Loss 0.8230 LearningRate 0.0005 Epoch: 18 Global Step: 231260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:48:10,298-Speed 3353.87 samples/sec Loss 0.8789 LearningRate 0.0005 Epoch: 18 Global Step: 231270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:48:13,370-Speed 3334.73 samples/sec Loss 0.8170 LearningRate 0.0005 Epoch: 18 Global Step: 231280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:48:16,447-Speed 3329.20 samples/sec Loss 0.8156 LearningRate 0.0005 Epoch: 18 Global Step: 231290 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:19,546-Speed 3305.37 samples/sec Loss 0.8393 LearningRate 0.0005 Epoch: 18 Global Step: 231300 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:22,613-Speed 3339.94 samples/sec Loss 0.8281 LearningRate 0.0005 Epoch: 18 Global Step: 231310 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:25,746-Speed 3268.88 samples/sec Loss 0.8175 LearningRate 0.0005 Epoch: 18 Global Step: 231320 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:28,869-Speed 3280.44 samples/sec Loss 0.8193 LearningRate 0.0005 Epoch: 18 Global Step: 231330 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:31,941-Speed 3333.78 samples/sec Loss 0.8481 LearningRate 0.0005 Epoch: 18 Global Step: 231340 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:35,021-Speed 3326.08 samples/sec Loss 0.7958 LearningRate 0.0005 Epoch: 18 Global Step: 231350 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:38,125-Speed 3299.69 samples/sec Loss 0.8553 LearningRate 0.0005 Epoch: 18 Global Step: 231360 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:41,239-Speed 3290.25 samples/sec Loss 0.8068 LearningRate 0.0005 Epoch: 18 Global Step: 231370 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:44,402-Speed 3237.91 samples/sec Loss 0.8460 LearningRate 0.0005 Epoch: 18 Global Step: 231380 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:47,478-Speed 3330.62 samples/sec Loss 0.8221 LearningRate 0.0005 Epoch: 18 Global Step: 231390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:48:50,542-Speed 3342.95 samples/sec Loss 0.8223 LearningRate 0.0005 Epoch: 18 Global Step: 231400 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:53,626-Speed 3321.22 samples/sec Loss 0.8427 LearningRate 0.0005 Epoch: 18 Global Step: 231410 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:56,724-Speed 3306.92 samples/sec Loss 0.8043 LearningRate 0.0005 Epoch: 18 Global Step: 231420 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:48:59,788-Speed 3342.61 samples/sec Loss 0.7914 LearningRate 0.0005 Epoch: 18 Global Step: 231430 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:49:02,963-Speed 3226.33 samples/sec Loss 0.8788 LearningRate 0.0005 Epoch: 18 Global Step: 231440 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:49:06,118-Speed 3247.03 samples/sec Loss 0.8514 LearningRate 0.0005 Epoch: 18 Global Step: 231450 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:49:09,199-Speed 3324.67 samples/sec Loss 0.8575 LearningRate 0.0005 Epoch: 18 Global Step: 231460 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:49:12,293-Speed 3310.89 samples/sec Loss 0.8337 LearningRate 0.0005 Epoch: 18 Global Step: 231470 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:49:15,345-Speed 3356.03 samples/sec Loss 0.8329 LearningRate 0.0005 Epoch: 18 Global Step: 231480 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:49:18,413-Speed 3338.77 samples/sec Loss 0.8167 LearningRate 0.0005 Epoch: 18 Global Step: 231490 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:49:21,468-Speed 3353.09 samples/sec Loss 0.8505 LearningRate 0.0005 Epoch: 18 Global Step: 231500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:49:24,555-Speed 3318.50 samples/sec Loss 0.8289 LearningRate 0.0005 Epoch: 18 Global Step: 231510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:49:27,628-Speed 3333.12 samples/sec Loss 0.8634 LearningRate 0.0005 Epoch: 18 Global Step: 231520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:49:30,728-Speed 3303.75 samples/sec Loss 0.8109 LearningRate 0.0005 Epoch: 18 Global Step: 231530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:49:33,794-Speed 3341.28 samples/sec Loss 0.8154 LearningRate 0.0005 Epoch: 18 Global Step: 231540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:49:36,873-Speed 3326.69 samples/sec Loss 0.8854 LearningRate 0.0005 Epoch: 18 Global Step: 231550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:49:40,015-Speed 3260.99 samples/sec Loss 0.8368 LearningRate 0.0005 Epoch: 18 Global Step: 231560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:49:43,187-Speed 3228.76 samples/sec Loss 0.7985 LearningRate 0.0005 Epoch: 18 Global Step: 231570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:49:46,236-Speed 3359.91 samples/sec Loss 0.8166 LearningRate 0.0005 Epoch: 18 Global Step: 231580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:49:49,317-Speed 3323.74 samples/sec Loss 0.8684 LearningRate 0.0005 Epoch: 18 Global Step: 231590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:49:52,448-Speed 3272.65 samples/sec Loss 0.8237 LearningRate 0.0005 Epoch: 18 Global Step: 231600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:49:55,553-Speed 3298.21 samples/sec Loss 0.8151 LearningRate 0.0005 Epoch: 18 Global Step: 231610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:49:58,632-Speed 3327.58 samples/sec Loss 0.8431 LearningRate 0.0005 Epoch: 18 Global Step: 231620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:50:01,707-Speed 3331.37 samples/sec Loss 0.8313 LearningRate 0.0005 Epoch: 18 Global Step: 231630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:50:04,815-Speed 3295.34 samples/sec Loss 0.8510 LearningRate 0.0005 Epoch: 18 Global Step: 231640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:50:07,869-Speed 3354.02 samples/sec Loss 0.8032 LearningRate 0.0005 Epoch: 18 Global Step: 231650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:10,943-Speed 3332.76 samples/sec Loss 0.8411 LearningRate 0.0005 Epoch: 18 Global Step: 231660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:14,167-Speed 3177.01 samples/sec Loss 0.8182 LearningRate 0.0005 Epoch: 18 Global Step: 231670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:17,362-Speed 3205.41 samples/sec Loss 0.8474 LearningRate 0.0005 Epoch: 18 Global Step: 231680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:20,462-Speed 3304.33 samples/sec Loss 0.8540 LearningRate 0.0005 Epoch: 18 Global Step: 231690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:23,593-Speed 3271.14 samples/sec Loss 0.8463 LearningRate 0.0005 Epoch: 18 Global Step: 231700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:26,751-Speed 3244.11 samples/sec Loss 0.8512 LearningRate 0.0005 Epoch: 18 Global Step: 231710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:29,895-Speed 3258.43 samples/sec Loss 0.8050 LearningRate 0.0005 Epoch: 18 Global Step: 231720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:32,984-Speed 3315.74 samples/sec Loss 0.8312 LearningRate 0.0005 Epoch: 18 Global Step: 231730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:36,135-Speed 3250.84 samples/sec Loss 0.8400 LearningRate 0.0005 Epoch: 18 Global Step: 231740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:39,232-Speed 3306.98 samples/sec Loss 0.8211 LearningRate 0.0005 Epoch: 18 Global Step: 231750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:42,372-Speed 3263.47 samples/sec Loss 0.8290 LearningRate 0.0004 Epoch: 18 Global Step: 231760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:45,447-Speed 3330.92 samples/sec Loss 0.8486 LearningRate 0.0004 Epoch: 18 Global Step: 231770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:48,553-Speed 3297.93 samples/sec Loss 0.8826 LearningRate 0.0004 Epoch: 18 Global Step: 231780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:51,759-Speed 3194.51 samples/sec Loss 0.8147 LearningRate 0.0004 Epoch: 18 Global Step: 231790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:54,868-Speed 3295.61 samples/sec Loss 0.8252 LearningRate 0.0004 Epoch: 18 Global Step: 231800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:50:57,926-Speed 3349.37 samples/sec Loss 0.8674 LearningRate 0.0004 Epoch: 18 Global Step: 231810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:51:01,066-Speed 3263.10 samples/sec Loss 0.8125 LearningRate 0.0004 Epoch: 18 Global Step: 231820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:51:04,170-Speed 3299.62 samples/sec Loss 0.8336 LearningRate 0.0004 Epoch: 18 Global Step: 231830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:51:07,276-Speed 3298.61 samples/sec Loss 0.8357 LearningRate 0.0004 Epoch: 18 Global Step: 231840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:51:10,351-Speed 3330.51 samples/sec Loss 0.8074 LearningRate 0.0004 Epoch: 18 Global Step: 231850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:51:13,415-Speed 3343.95 samples/sec Loss 0.8190 LearningRate 0.0004 Epoch: 18 Global Step: 231860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:51:16,536-Speed 3281.31 samples/sec Loss 0.8172 LearningRate 0.0004 Epoch: 18 Global Step: 231870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:51:19,678-Speed 3260.25 samples/sec Loss 0.8369 LearningRate 0.0004 Epoch: 18 Global Step: 231880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:51:22,777-Speed 3306.21 samples/sec Loss 0.8432 LearningRate 0.0004 Epoch: 18 Global Step: 231890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:51:25,887-Speed 3292.82 samples/sec Loss 0.8149 LearningRate 0.0004 Epoch: 18 Global Step: 231900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:51:29,050-Speed 3239.39 samples/sec Loss 0.7931 LearningRate 0.0004 Epoch: 18 Global Step: 231910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:51:32,206-Speed 3246.12 samples/sec Loss 0.8433 LearningRate 0.0004 Epoch: 18 Global Step: 231920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:51:35,285-Speed 3326.84 samples/sec Loss 0.8302 LearningRate 0.0004 Epoch: 18 Global Step: 231930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:51:38,380-Speed 3309.70 samples/sec Loss 0.8181 LearningRate 0.0004 Epoch: 18 Global Step: 231940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:51:41,510-Speed 3272.87 samples/sec Loss 0.8374 LearningRate 0.0004 Epoch: 18 Global Step: 231950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:51:44,663-Speed 3248.42 samples/sec Loss 0.8319 LearningRate 0.0004 Epoch: 18 Global Step: 231960 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:51:47,817-Speed 3248.23 samples/sec Loss 0.8557 LearningRate 0.0004 Epoch: 18 Global Step: 231970 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:51:50,912-Speed 3308.78 samples/sec Loss 0.8447 LearningRate 0.0004 Epoch: 18 Global Step: 231980 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:51:54,072-Speed 3241.11 samples/sec Loss 0.8129 LearningRate 0.0004 Epoch: 18 Global Step: 231990 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:51:57,143-Speed 3336.33 samples/sec Loss 0.8084 LearningRate 0.0004 Epoch: 18 Global Step: 232000 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:52:00,285-Speed 3260.20 samples/sec Loss 0.7952 LearningRate 0.0004 Epoch: 18 Global Step: 232010 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:52:03,430-Speed 3257.01 samples/sec Loss 0.8472 LearningRate 0.0004 Epoch: 18 Global Step: 232020 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:52:06,500-Speed 3336.14 samples/sec Loss 0.8310 LearningRate 0.0004 Epoch: 18 Global Step: 232030 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:52:09,568-Speed 3338.72 samples/sec Loss 0.8411 LearningRate 0.0004 Epoch: 18 Global Step: 232040 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:52:12,699-Speed 3271.74 samples/sec Loss 0.8283 LearningRate 0.0004 Epoch: 18 Global Step: 232050 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:52:15,887-Speed 3212.50 samples/sec Loss 0.8402 LearningRate 0.0004 Epoch: 18 Global Step: 232060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:52:19,022-Speed 3267.93 samples/sec Loss 0.8674 LearningRate 0.0004 Epoch: 18 Global Step: 232070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:52:22,069-Speed 3361.80 samples/sec Loss 0.8532 LearningRate 0.0004 Epoch: 18 Global Step: 232080 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 21:52:25,183-Speed 3289.72 samples/sec Loss 0.8346 LearningRate 0.0004 Epoch: 18 Global Step: 232090 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 21:52:28,273-Speed 3314.08 samples/sec Loss 0.8389 LearningRate 0.0004 Epoch: 18 Global Step: 232100 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 21:52:31,364-Speed 3313.83 samples/sec Loss 0.8153 LearningRate 0.0004 Epoch: 18 Global Step: 232110 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 21:52:34,467-Speed 3301.87 samples/sec Loss 0.8367 LearningRate 0.0004 Epoch: 18 Global Step: 232120 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 21:52:37,598-Speed 3271.28 samples/sec Loss 0.8481 LearningRate 0.0004 Epoch: 18 Global Step: 232130 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 21:52:40,687-Speed 3315.38 samples/sec Loss 0.8378 LearningRate 0.0004 Epoch: 18 Global Step: 232140 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 21:52:43,777-Speed 3315.75 samples/sec Loss 0.8487 LearningRate 0.0004 Epoch: 18 Global Step: 232150 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 21:52:46,881-Speed 3300.26 samples/sec Loss 0.8283 LearningRate 0.0004 Epoch: 18 Global Step: 232160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 21:52:49,969-Speed 3316.90 samples/sec Loss 0.7810 LearningRate 0.0004 Epoch: 18 Global Step: 232170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 21:52:53,106-Speed 3264.72 samples/sec Loss 0.8215 LearningRate 0.0004 Epoch: 18 Global Step: 232180 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:52:56,177-Speed 3335.42 samples/sec Loss 0.8297 LearningRate 0.0004 Epoch: 18 Global Step: 232190 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:52:59,280-Speed 3301.61 samples/sec Loss 0.8725 LearningRate 0.0004 Epoch: 18 Global Step: 232200 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:53:02,421-Speed 3260.40 samples/sec Loss 0.8016 LearningRate 0.0004 Epoch: 18 Global Step: 232210 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:53:05,542-Speed 3282.17 samples/sec Loss 0.8186 LearningRate 0.0004 Epoch: 18 Global Step: 232220 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:53:08,681-Speed 3263.56 samples/sec Loss 0.8054 LearningRate 0.0004 Epoch: 18 Global Step: 232230 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:53:11,787-Speed 3297.52 samples/sec Loss 0.8166 LearningRate 0.0004 Epoch: 18 Global Step: 232240 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:53:14,871-Speed 3321.51 samples/sec Loss 0.8011 LearningRate 0.0004 Epoch: 18 Global Step: 232250 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:53:17,976-Speed 3300.06 samples/sec Loss 0.8560 LearningRate 0.0004 Epoch: 18 Global Step: 232260 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:53:21,047-Speed 3334.66 samples/sec Loss 0.7966 LearningRate 0.0004 Epoch: 18 Global Step: 232270 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:53:24,120-Speed 3333.64 samples/sec Loss 0.8047 LearningRate 0.0004 Epoch: 18 Global Step: 232280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:53:27,274-Speed 3247.75 samples/sec Loss 0.8601 LearningRate 0.0004 Epoch: 18 Global Step: 232290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:53:30,386-Speed 3291.24 samples/sec Loss 0.8818 LearningRate 0.0004 Epoch: 18 Global Step: 232300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:53:33,468-Speed 3323.82 samples/sec Loss 0.8032 LearningRate 0.0004 Epoch: 18 Global Step: 232310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:53:36,625-Speed 3243.78 samples/sec Loss 0.8009 LearningRate 0.0004 Epoch: 18 Global Step: 232320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:53:39,739-Speed 3289.42 samples/sec Loss 0.8286 LearningRate 0.0004 Epoch: 18 Global Step: 232330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:53:42,931-Speed 3208.97 samples/sec Loss 0.8410 LearningRate 0.0004 Epoch: 18 Global Step: 232340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:53:46,027-Speed 3308.88 samples/sec Loss 0.8075 LearningRate 0.0004 Epoch: 18 Global Step: 232350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:53:49,154-Speed 3275.97 samples/sec Loss 0.8516 LearningRate 0.0004 Epoch: 18 Global Step: 232360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:53:52,344-Speed 3211.36 samples/sec Loss 0.8040 LearningRate 0.0004 Epoch: 18 Global Step: 232370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:53:55,454-Speed 3292.74 samples/sec Loss 0.8304 LearningRate 0.0004 Epoch: 18 Global Step: 232380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:53:58,560-Speed 3298.10 samples/sec Loss 0.7984 LearningRate 0.0004 Epoch: 18 Global Step: 232390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:01,647-Speed 3318.26 samples/sec Loss 0.8369 LearningRate 0.0004 Epoch: 18 Global Step: 232400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:04,767-Speed 3283.19 samples/sec Loss 0.8326 LearningRate 0.0004 Epoch: 18 Global Step: 232410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:07,846-Speed 3326.74 samples/sec Loss 0.8236 LearningRate 0.0004 Epoch: 18 Global Step: 232420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:10,942-Speed 3309.02 samples/sec Loss 0.8271 LearningRate 0.0004 Epoch: 18 Global Step: 232430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:14,035-Speed 3312.05 samples/sec Loss 0.8274 LearningRate 0.0004 Epoch: 18 Global Step: 232440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:17,116-Speed 3324.40 samples/sec Loss 0.8295 LearningRate 0.0004 Epoch: 18 Global Step: 232450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:20,250-Speed 3268.77 samples/sec Loss 0.8462 LearningRate 0.0004 Epoch: 18 Global Step: 232460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:23,368-Speed 3284.98 samples/sec Loss 0.8193 LearningRate 0.0004 Epoch: 18 Global Step: 232470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:26,568-Speed 3201.10 samples/sec Loss 0.8057 LearningRate 0.0004 Epoch: 18 Global Step: 232480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:54:29,688-Speed 3283.45 samples/sec Loss 0.8013 LearningRate 0.0004 Epoch: 18 Global Step: 232490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:32,784-Speed 3307.87 samples/sec Loss 0.8636 LearningRate 0.0004 Epoch: 18 Global Step: 232500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:35,917-Speed 3269.78 samples/sec Loss 0.8055 LearningRate 0.0004 Epoch: 18 Global Step: 232510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:39,001-Speed 3321.51 samples/sec Loss 0.7996 LearningRate 0.0004 Epoch: 18 Global Step: 232520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:42,069-Speed 3338.49 samples/sec Loss 0.7915 LearningRate 0.0004 Epoch: 18 Global Step: 232530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:54:45,139-Speed 3339.67 samples/sec Loss 0.8367 LearningRate 0.0004 Epoch: 18 Global Step: 232540 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:54:48,239-Speed 3304.28 samples/sec Loss 0.8331 LearningRate 0.0004 Epoch: 18 Global Step: 232550 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:54:51,310-Speed 3335.13 samples/sec Loss 0.8426 LearningRate 0.0004 Epoch: 18 Global Step: 232560 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:54:54,471-Speed 3240.48 samples/sec Loss 0.8353 LearningRate 0.0004 Epoch: 18 Global Step: 232570 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:54:57,558-Speed 3317.95 samples/sec Loss 0.7999 LearningRate 0.0004 Epoch: 18 Global Step: 232580 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:55:00,654-Speed 3308.73 samples/sec Loss 0.8334 LearningRate 0.0004 Epoch: 18 Global Step: 232590 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:55:03,761-Speed 3297.03 samples/sec Loss 0.8314 LearningRate 0.0004 Epoch: 18 Global Step: 232600 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:55:06,852-Speed 3313.61 samples/sec Loss 0.8360 LearningRate 0.0004 Epoch: 18 Global Step: 232610 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:55:09,977-Speed 3277.74 samples/sec Loss 0.8412 LearningRate 0.0004 Epoch: 18 Global Step: 232620 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:55:13,098-Speed 3281.65 samples/sec Loss 0.8094 LearningRate 0.0004 Epoch: 18 Global Step: 232630 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:55:16,230-Speed 3270.69 samples/sec Loss 0.8442 LearningRate 0.0004 Epoch: 18 Global Step: 232640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:55:19,359-Speed 3273.86 samples/sec Loss 0.8572 LearningRate 0.0004 Epoch: 18 Global Step: 232650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:55:22,464-Speed 3298.83 samples/sec Loss 0.8332 LearningRate 0.0004 Epoch: 18 Global Step: 232660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:55:25,567-Speed 3301.28 samples/sec Loss 0.8195 LearningRate 0.0004 Epoch: 18 Global Step: 232670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:55:28,695-Speed 3274.85 samples/sec Loss 0.7765 LearningRate 0.0004 Epoch: 18 Global Step: 232680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:55:31,796-Speed 3302.44 samples/sec Loss 0.8278 LearningRate 0.0004 Epoch: 18 Global Step: 232690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:55:34,903-Speed 3297.73 samples/sec Loss 0.8134 LearningRate 0.0004 Epoch: 18 Global Step: 232700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:55:38,014-Speed 3292.29 samples/sec Loss 0.8154 LearningRate 0.0004 Epoch: 18 Global Step: 232710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:55:41,089-Speed 3331.21 samples/sec Loss 0.8275 LearningRate 0.0004 Epoch: 18 Global Step: 232720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:55:44,168-Speed 3327.44 samples/sec Loss 0.8129 LearningRate 0.0004 Epoch: 18 Global Step: 232730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:55:47,308-Speed 3262.53 samples/sec Loss 0.8059 LearningRate 0.0004 Epoch: 18 Global Step: 232740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:55:50,375-Speed 3339.70 samples/sec Loss 0.8204 LearningRate 0.0004 Epoch: 18 Global Step: 232750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:55:53,479-Speed 3299.96 samples/sec Loss 0.8172 LearningRate 0.0004 Epoch: 18 Global Step: 232760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:55:56,560-Speed 3324.75 samples/sec Loss 0.8042 LearningRate 0.0004 Epoch: 18 Global Step: 232770 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:55:59,666-Speed 3296.82 samples/sec Loss 0.8625 LearningRate 0.0004 Epoch: 18 Global Step: 232780 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:56:02,863-Speed 3204.67 samples/sec Loss 0.8550 LearningRate 0.0004 Epoch: 18 Global Step: 232790 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:56:05,974-Speed 3292.89 samples/sec Loss 0.8507 LearningRate 0.0004 Epoch: 18 Global Step: 232800 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:56:09,112-Speed 3263.69 samples/sec Loss 0.8404 LearningRate 0.0004 Epoch: 18 Global Step: 232810 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:56:12,192-Speed 3326.09 samples/sec Loss 0.8442 LearningRate 0.0004 Epoch: 18 Global Step: 232820 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:56:15,290-Speed 3306.31 samples/sec Loss 0.8317 LearningRate 0.0004 Epoch: 18 Global Step: 232830 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:56:18,494-Speed 3197.46 samples/sec Loss 0.8559 LearningRate 0.0004 Epoch: 18 Global Step: 232840 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:56:21,573-Speed 3326.56 samples/sec Loss 0.8248 LearningRate 0.0004 Epoch: 18 Global Step: 232850 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:56:24,742-Speed 3232.32 samples/sec Loss 0.8142 LearningRate 0.0004 Epoch: 18 Global Step: 232860 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:56:27,916-Speed 3226.95 samples/sec Loss 0.8112 LearningRate 0.0004 Epoch: 18 Global Step: 232870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:56:31,045-Speed 3273.52 samples/sec Loss 0.8282 LearningRate 0.0004 Epoch: 18 Global Step: 232880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:56:34,139-Speed 3310.80 samples/sec Loss 0.8736 LearningRate 0.0004 Epoch: 18 Global Step: 232890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:56:37,245-Speed 3298.14 samples/sec Loss 0.8615 LearningRate 0.0004 Epoch: 18 Global Step: 232900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:56:40,381-Speed 3265.92 samples/sec Loss 0.8227 LearningRate 0.0004 Epoch: 18 Global Step: 232910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:56:43,532-Speed 3251.37 samples/sec Loss 0.7905 LearningRate 0.0004 Epoch: 18 Global Step: 232920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:56:46,621-Speed 3315.27 samples/sec Loss 0.8228 LearningRate 0.0004 Epoch: 18 Global Step: 232930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:56:49,766-Speed 3257.77 samples/sec Loss 0.8511 LearningRate 0.0004 Epoch: 18 Global Step: 232940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:56:52,950-Speed 3216.15 samples/sec Loss 0.8167 LearningRate 0.0004 Epoch: 18 Global Step: 232950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:56:56,103-Speed 3248.84 samples/sec Loss 0.8179 LearningRate 0.0004 Epoch: 18 Global Step: 232960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:56:59,257-Speed 3248.34 samples/sec Loss 0.8274 LearningRate 0.0004 Epoch: 18 Global Step: 232970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:57:02,378-Speed 3282.08 samples/sec Loss 0.8279 LearningRate 0.0004 Epoch: 18 Global Step: 232980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:57:05,501-Speed 3279.11 samples/sec Loss 0.7966 LearningRate 0.0004 Epoch: 18 Global Step: 232990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:57:08,586-Speed 3320.55 samples/sec Loss 0.8422 LearningRate 0.0004 Epoch: 18 Global Step: 233000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:57:11,720-Speed 3268.42 samples/sec Loss 0.8581 LearningRate 0.0004 Epoch: 18 Global Step: 233010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:57:14,847-Speed 3275.73 samples/sec Loss 0.8139 LearningRate 0.0004 Epoch: 18 Global Step: 233020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:57:18,035-Speed 3213.10 samples/sec Loss 0.8380 LearningRate 0.0004 Epoch: 18 Global Step: 233030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:57:21,137-Speed 3302.14 samples/sec Loss 0.8216 LearningRate 0.0004 Epoch: 18 Global Step: 233040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:57:24,200-Speed 3344.48 samples/sec Loss 0.8047 LearningRate 0.0004 Epoch: 18 Global Step: 233050 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:57:27,301-Speed 3303.72 samples/sec Loss 0.8544 LearningRate 0.0004 Epoch: 18 Global Step: 233060 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:57:30,418-Speed 3286.19 samples/sec Loss 0.8344 LearningRate 0.0004 Epoch: 18 Global Step: 233070 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:57:33,493-Speed 3331.05 samples/sec Loss 0.8621 LearningRate 0.0004 Epoch: 18 Global Step: 233080 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:57:36,629-Speed 3266.37 samples/sec Loss 0.8435 LearningRate 0.0004 Epoch: 18 Global Step: 233090 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:57:39,745-Speed 3287.22 samples/sec Loss 0.8128 LearningRate 0.0004 Epoch: 18 Global Step: 233100 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:57:42,902-Speed 3244.36 samples/sec Loss 0.8386 LearningRate 0.0004 Epoch: 18 Global Step: 233110 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:57:46,003-Speed 3303.27 samples/sec Loss 0.8080 LearningRate 0.0004 Epoch: 18 Global Step: 233120 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:57:49,168-Speed 3235.70 samples/sec Loss 0.8267 LearningRate 0.0004 Epoch: 18 Global Step: 233130 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:57:52,295-Speed 3276.90 samples/sec Loss 0.8468 LearningRate 0.0004 Epoch: 18 Global Step: 233140 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:57:55,416-Speed 3282.34 samples/sec Loss 0.8510 LearningRate 0.0004 Epoch: 18 Global Step: 233150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:57:58,536-Speed 3282.52 samples/sec Loss 0.8196 LearningRate 0.0004 Epoch: 18 Global Step: 233160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:58:01,633-Speed 3307.78 samples/sec Loss 0.8099 LearningRate 0.0004 Epoch: 18 Global Step: 233170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:58:04,705-Speed 3334.23 samples/sec Loss 0.8010 LearningRate 0.0004 Epoch: 18 Global Step: 233180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:58:07,802-Speed 3307.86 samples/sec Loss 0.8205 LearningRate 0.0004 Epoch: 18 Global Step: 233190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:58:10,897-Speed 3309.70 samples/sec Loss 0.8314 LearningRate 0.0004 Epoch: 18 Global Step: 233200 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:14,037-Speed 3261.93 samples/sec Loss 0.8207 LearningRate 0.0004 Epoch: 18 Global Step: 233210 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:17,229-Speed 3209.29 samples/sec Loss 0.8443 LearningRate 0.0004 Epoch: 18 Global Step: 233220 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:20,335-Speed 3297.52 samples/sec Loss 0.7940 LearningRate 0.0004 Epoch: 18 Global Step: 233230 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:23,423-Speed 3316.27 samples/sec Loss 0.8395 LearningRate 0.0004 Epoch: 18 Global Step: 233240 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:26,534-Speed 3293.27 samples/sec Loss 0.8063 LearningRate 0.0004 Epoch: 18 Global Step: 233250 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:29,718-Speed 3216.79 samples/sec Loss 0.8167 LearningRate 0.0004 Epoch: 18 Global Step: 233260 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:32,877-Speed 3243.11 samples/sec Loss 0.8077 LearningRate 0.0004 Epoch: 18 Global Step: 233270 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:36,094-Speed 3183.38 samples/sec Loss 0.8115 LearningRate 0.0004 Epoch: 18 Global Step: 233280 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:39,236-Speed 3260.52 samples/sec Loss 0.8418 LearningRate 0.0004 Epoch: 18 Global Step: 233290 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:42,429-Speed 3207.97 samples/sec Loss 0.8207 LearningRate 0.0004 Epoch: 18 Global Step: 233300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:58:45,562-Speed 3269.33 samples/sec Loss 0.8303 LearningRate 0.0004 Epoch: 18 Global Step: 233310 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:48,751-Speed 3213.08 samples/sec Loss 0.8490 LearningRate 0.0004 Epoch: 18 Global Step: 233320 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:51,925-Speed 3227.01 samples/sec Loss 0.8472 LearningRate 0.0004 Epoch: 18 Global Step: 233330 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:55,041-Speed 3287.01 samples/sec Loss 0.8573 LearningRate 0.0004 Epoch: 18 Global Step: 233340 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:58:58,148-Speed 3297.10 samples/sec Loss 0.8498 LearningRate 0.0004 Epoch: 18 Global Step: 233350 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:59:01,259-Speed 3292.86 samples/sec Loss 0.8229 LearningRate 0.0004 Epoch: 18 Global Step: 233360 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:59:04,409-Speed 3250.80 samples/sec Loss 0.8317 LearningRate 0.0004 Epoch: 18 Global Step: 233370 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:59:07,509-Speed 3305.07 samples/sec Loss 0.8041 LearningRate 0.0004 Epoch: 18 Global Step: 233380 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:59:10,650-Speed 3261.06 samples/sec Loss 0.7818 LearningRate 0.0004 Epoch: 18 Global Step: 233390 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:59:13,809-Speed 3242.48 samples/sec Loss 0.8551 LearningRate 0.0004 Epoch: 18 Global Step: 233400 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 21:59:16,904-Speed 3309.01 samples/sec Loss 0.8562 LearningRate 0.0004 Epoch: 18 Global Step: 233410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:59:20,020-Speed 3287.12 samples/sec Loss 0.8128 LearningRate 0.0004 Epoch: 18 Global Step: 233420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:59:23,142-Speed 3281.63 samples/sec Loss 0.8397 LearningRate 0.0004 Epoch: 18 Global Step: 233430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:59:26,244-Speed 3302.06 samples/sec Loss 0.8760 LearningRate 0.0004 Epoch: 18 Global Step: 233440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:59:29,486-Speed 3158.71 samples/sec Loss 0.8133 LearningRate 0.0004 Epoch: 18 Global Step: 233450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:59:32,575-Speed 3316.19 samples/sec Loss 0.8370 LearningRate 0.0004 Epoch: 18 Global Step: 233460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:59:35,664-Speed 3315.82 samples/sec Loss 0.7882 LearningRate 0.0004 Epoch: 18 Global Step: 233470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:59:38,788-Speed 3278.84 samples/sec Loss 0.8032 LearningRate 0.0004 Epoch: 18 Global Step: 233480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:59:41,936-Speed 3254.44 samples/sec Loss 0.8359 LearningRate 0.0004 Epoch: 18 Global Step: 233490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:59:45,043-Speed 3296.25 samples/sec Loss 0.8299 LearningRate 0.0004 Epoch: 18 Global Step: 233500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 21:59:48,178-Speed 3268.02 samples/sec Loss 0.7900 LearningRate 0.0004 Epoch: 18 Global Step: 233510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:59:51,286-Speed 3295.06 samples/sec Loss 0.8392 LearningRate 0.0004 Epoch: 18 Global Step: 233520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:59:54,377-Speed 3314.25 samples/sec Loss 0.8156 LearningRate 0.0004 Epoch: 18 Global Step: 233530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 21:59:57,450-Speed 3333.57 samples/sec Loss 0.7785 LearningRate 0.0004 Epoch: 18 Global Step: 233540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:00:00,535-Speed 3319.93 samples/sec Loss 0.8484 LearningRate 0.0004 Epoch: 18 Global Step: 233550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:00:03,617-Speed 3324.03 samples/sec Loss 0.8147 LearningRate 0.0004 Epoch: 18 Global Step: 233560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:00:06,759-Speed 3260.06 samples/sec Loss 0.7902 LearningRate 0.0004 Epoch: 18 Global Step: 233570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:00:09,823-Speed 3342.53 samples/sec Loss 0.8671 LearningRate 0.0004 Epoch: 18 Global Step: 233580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:00:12,949-Speed 3277.12 samples/sec Loss 0.8301 LearningRate 0.0004 Epoch: 18 Global Step: 233590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:00:16,033-Speed 3321.68 samples/sec Loss 0.8275 LearningRate 0.0004 Epoch: 18 Global Step: 233600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:00:19,127-Speed 3310.53 samples/sec Loss 0.8099 LearningRate 0.0004 Epoch: 18 Global Step: 233610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:00:22,221-Speed 3310.68 samples/sec Loss 0.8217 LearningRate 0.0004 Epoch: 18 Global Step: 233620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:00:25,356-Speed 3267.84 samples/sec Loss 0.8328 LearningRate 0.0004 Epoch: 18 Global Step: 233630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:00:28,478-Speed 3280.82 samples/sec Loss 0.7669 LearningRate 0.0004 Epoch: 18 Global Step: 233640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:00:31,612-Speed 3268.57 samples/sec Loss 0.8139 LearningRate 0.0004 Epoch: 18 Global Step: 233650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:00:34,695-Speed 3322.18 samples/sec Loss 0.8346 LearningRate 0.0004 Epoch: 18 Global Step: 233660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:00:37,857-Speed 3239.58 samples/sec Loss 0.8420 LearningRate 0.0004 Epoch: 18 Global Step: 233670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:00:41,087-Speed 3170.84 samples/sec Loss 0.8435 LearningRate 0.0004 Epoch: 18 Global Step: 233680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:00:44,200-Speed 3291.19 samples/sec Loss 0.8165 LearningRate 0.0004 Epoch: 18 Global Step: 233690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:00:47,321-Speed 3281.43 samples/sec Loss 0.8534 LearningRate 0.0004 Epoch: 18 Global Step: 233700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:00:50,472-Speed 3250.64 samples/sec Loss 0.8451 LearningRate 0.0004 Epoch: 18 Global Step: 233710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:00:53,629-Speed 3245.45 samples/sec Loss 0.8323 LearningRate 0.0004 Epoch: 18 Global Step: 233720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:00:56,716-Speed 3317.93 samples/sec Loss 0.8600 LearningRate 0.0003 Epoch: 18 Global Step: 233730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:00:59,806-Speed 3315.42 samples/sec Loss 0.8407 LearningRate 0.0003 Epoch: 18 Global Step: 233740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:01:02,892-Speed 3318.84 samples/sec Loss 0.8218 LearningRate 0.0003 Epoch: 18 Global Step: 233750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:01:05,992-Speed 3304.52 samples/sec Loss 0.8053 LearningRate 0.0003 Epoch: 18 Global Step: 233760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:01:09,098-Speed 3297.72 samples/sec Loss 0.8262 LearningRate 0.0003 Epoch: 18 Global Step: 233770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:01:12,254-Speed 3245.80 samples/sec Loss 0.8042 LearningRate 0.0003 Epoch: 18 Global Step: 233780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:01:15,379-Speed 3277.71 samples/sec Loss 0.8605 LearningRate 0.0003 Epoch: 18 Global Step: 233790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:18,539-Speed 3241.11 samples/sec Loss 0.8488 LearningRate 0.0003 Epoch: 18 Global Step: 233800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:21,617-Speed 3329.29 samples/sec Loss 0.8265 LearningRate 0.0003 Epoch: 18 Global Step: 233810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:24,740-Speed 3279.86 samples/sec Loss 0.8490 LearningRate 0.0003 Epoch: 18 Global Step: 233820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:27,935-Speed 3206.02 samples/sec Loss 0.8429 LearningRate 0.0003 Epoch: 18 Global Step: 233830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:31,106-Speed 3230.78 samples/sec Loss 0.7942 LearningRate 0.0003 Epoch: 18 Global Step: 233840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:34,160-Speed 3354.25 samples/sec Loss 0.8566 LearningRate 0.0003 Epoch: 18 Global Step: 233850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:37,262-Speed 3302.37 samples/sec Loss 0.8536 LearningRate 0.0003 Epoch: 18 Global Step: 233860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:40,356-Speed 3309.85 samples/sec Loss 0.7697 LearningRate 0.0003 Epoch: 18 Global Step: 233870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:43,437-Speed 3325.66 samples/sec Loss 0.8594 LearningRate 0.0003 Epoch: 18 Global Step: 233880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:46,535-Speed 3305.79 samples/sec Loss 0.8155 LearningRate 0.0003 Epoch: 18 Global Step: 233890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:01:49,641-Speed 3297.75 samples/sec Loss 0.8430 LearningRate 0.0003 Epoch: 18 Global Step: 233900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:01:52,718-Speed 3329.64 samples/sec Loss 0.8376 LearningRate 0.0003 Epoch: 18 Global Step: 233910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:55,831-Speed 3289.89 samples/sec Loss 0.8399 LearningRate 0.0003 Epoch: 18 Global Step: 233920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:01:58,925-Speed 3310.79 samples/sec Loss 0.8178 LearningRate 0.0003 Epoch: 18 Global Step: 233930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:02,036-Speed 3293.44 samples/sec Loss 0.8295 LearningRate 0.0003 Epoch: 18 Global Step: 233940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:05,146-Speed 3293.25 samples/sec Loss 0.8037 LearningRate 0.0003 Epoch: 18 Global Step: 233950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:08,227-Speed 3324.39 samples/sec Loss 0.8036 LearningRate 0.0003 Epoch: 18 Global Step: 233960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:11,304-Speed 3329.97 samples/sec Loss 0.8005 LearningRate 0.0003 Epoch: 18 Global Step: 233970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:14,435-Speed 3271.09 samples/sec Loss 0.8086 LearningRate 0.0003 Epoch: 18 Global Step: 233980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:17,532-Speed 3307.32 samples/sec Loss 0.8218 LearningRate 0.0003 Epoch: 18 Global Step: 233990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:20,642-Speed 3293.31 samples/sec Loss 0.8141 LearningRate 0.0003 Epoch: 18 Global Step: 234000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:23,764-Speed 3282.20 samples/sec Loss 0.8249 LearningRate 0.0003 Epoch: 18 Global Step: 234010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:02:26,803-Speed 3370.54 samples/sec Loss 0.8435 LearningRate 0.0003 Epoch: 18 Global Step: 234020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:29,857-Speed 3353.17 samples/sec Loss 0.8144 LearningRate 0.0003 Epoch: 18 Global Step: 234030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:32,925-Speed 3339.32 samples/sec Loss 0.8051 LearningRate 0.0003 Epoch: 18 Global Step: 234040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:36,064-Speed 3263.58 samples/sec Loss 0.8237 LearningRate 0.0003 Epoch: 18 Global Step: 234050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:39,119-Speed 3352.76 samples/sec Loss 0.8326 LearningRate 0.0003 Epoch: 18 Global Step: 234060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:42,191-Speed 3334.44 samples/sec Loss 0.8393 LearningRate 0.0003 Epoch: 18 Global Step: 234070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:45,239-Speed 3361.06 samples/sec Loss 0.7827 LearningRate 0.0003 Epoch: 18 Global Step: 234080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:48,315-Speed 3329.55 samples/sec Loss 0.8446 LearningRate 0.0003 Epoch: 18 Global Step: 234090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:51,403-Speed 3317.22 samples/sec Loss 0.7984 LearningRate 0.0003 Epoch: 18 Global Step: 234100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:54,505-Speed 3301.90 samples/sec Loss 0.8186 LearningRate 0.0003 Epoch: 18 Global Step: 234110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:02:57,617-Speed 3291.80 samples/sec Loss 0.8257 LearningRate 0.0003 Epoch: 18 Global Step: 234120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:03:00,709-Speed 3312.14 samples/sec Loss 0.8017 LearningRate 0.0003 Epoch: 18 Global Step: 234130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:03:03,836-Speed 3276.39 samples/sec Loss 0.8275 LearningRate 0.0003 Epoch: 18 Global Step: 234140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:03:06,944-Speed 3296.23 samples/sec Loss 0.8710 LearningRate 0.0003 Epoch: 18 Global Step: 234150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:03:10,018-Speed 3331.53 samples/sec Loss 0.8267 LearningRate 0.0003 Epoch: 18 Global Step: 234160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:03:13,142-Speed 3279.83 samples/sec Loss 0.8490 LearningRate 0.0003 Epoch: 18 Global Step: 234170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:03:16,175-Speed 3377.08 samples/sec Loss 0.8144 LearningRate 0.0003 Epoch: 18 Global Step: 234180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:03:19,240-Speed 3342.16 samples/sec Loss 0.8355 LearningRate 0.0003 Epoch: 18 Global Step: 234190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:03:22,376-Speed 3265.83 samples/sec Loss 0.8438 LearningRate 0.0003 Epoch: 18 Global Step: 234200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:03:25,472-Speed 3308.90 samples/sec Loss 0.8684 LearningRate 0.0003 Epoch: 18 Global Step: 234210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:03:28,573-Speed 3302.69 samples/sec Loss 0.8403 LearningRate 0.0003 Epoch: 18 Global Step: 234220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:03:31,690-Speed 3286.29 samples/sec Loss 0.7943 LearningRate 0.0003 Epoch: 18 Global Step: 234230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:03:34,791-Speed 3303.23 samples/sec Loss 0.7847 LearningRate 0.0003 Epoch: 18 Global Step: 234240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:03:37,892-Speed 3303.38 samples/sec Loss 0.8167 LearningRate 0.0003 Epoch: 18 Global Step: 234250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:03:41,005-Speed 3290.25 samples/sec Loss 0.8324 LearningRate 0.0003 Epoch: 18 Global Step: 234260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:03:44,111-Speed 3298.42 samples/sec Loss 0.8388 LearningRate 0.0003 Epoch: 18 Global Step: 234270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:03:47,188-Speed 3328.65 samples/sec Loss 0.8231 LearningRate 0.0003 Epoch: 18 Global Step: 234280 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:03:50,277-Speed 3315.95 samples/sec Loss 0.8283 LearningRate 0.0003 Epoch: 18 Global Step: 234290 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:03:53,389-Speed 3292.15 samples/sec Loss 0.8175 LearningRate 0.0003 Epoch: 18 Global Step: 234300 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:03:56,454-Speed 3342.08 samples/sec Loss 0.8128 LearningRate 0.0003 Epoch: 18 Global Step: 234310 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:03:59,603-Speed 3252.72 samples/sec Loss 0.8354 LearningRate 0.0003 Epoch: 18 Global Step: 234320 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:02,693-Speed 3315.34 samples/sec Loss 0.8115 LearningRate 0.0003 Epoch: 18 Global Step: 234330 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:05,764-Speed 3334.70 samples/sec Loss 0.8159 LearningRate 0.0003 Epoch: 18 Global Step: 234340 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:08,834-Speed 3337.36 samples/sec Loss 0.8086 LearningRate 0.0003 Epoch: 18 Global Step: 234350 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:11,967-Speed 3269.47 samples/sec Loss 0.8007 LearningRate 0.0003 Epoch: 18 Global Step: 234360 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:15,048-Speed 3323.91 samples/sec Loss 0.8204 LearningRate 0.0003 Epoch: 18 Global Step: 234370 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:18,153-Speed 3299.43 samples/sec Loss 0.8181 LearningRate 0.0003 Epoch: 18 Global Step: 234380 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:21,218-Speed 3341.89 samples/sec Loss 0.8333 LearningRate 0.0003 Epoch: 18 Global Step: 234390 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:24,301-Speed 3322.95 samples/sec Loss 0.7883 LearningRate 0.0003 Epoch: 18 Global Step: 234400 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:27,414-Speed 3290.55 samples/sec Loss 0.8670 LearningRate 0.0003 Epoch: 18 Global Step: 234410 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:30,583-Speed 3231.93 samples/sec Loss 0.8182 LearningRate 0.0003 Epoch: 18 Global Step: 234420 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:33,650-Speed 3340.10 samples/sec Loss 0.8320 LearningRate 0.0003 Epoch: 18 Global Step: 234430 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:36,787-Speed 3266.10 samples/sec Loss 0.8841 LearningRate 0.0003 Epoch: 18 Global Step: 234440 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:39,924-Speed 3264.73 samples/sec Loss 0.8438 LearningRate 0.0003 Epoch: 18 Global Step: 234450 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:43,024-Speed 3303.94 samples/sec Loss 0.8421 LearningRate 0.0003 Epoch: 18 Global Step: 234460 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:46,129-Speed 3299.13 samples/sec Loss 0.8079 LearningRate 0.0003 Epoch: 18 Global Step: 234470 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:04:49,261-Speed 3270.53 samples/sec Loss 0.8434 LearningRate 0.0003 Epoch: 18 Global Step: 234480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:04:52,404-Speed 3258.74 samples/sec Loss 0.7865 LearningRate 0.0003 Epoch: 18 Global Step: 234490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:04:55,598-Speed 3207.32 samples/sec Loss 0.8072 LearningRate 0.0003 Epoch: 18 Global Step: 234500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:04:58,719-Speed 3282.26 samples/sec Loss 0.8088 LearningRate 0.0003 Epoch: 18 Global Step: 234510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:05:01,789-Speed 3336.56 samples/sec Loss 0.8121 LearningRate 0.0003 Epoch: 18 Global Step: 234520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:05:04,915-Speed 3276.86 samples/sec Loss 0.8128 LearningRate 0.0003 Epoch: 18 Global Step: 234530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:05:07,961-Speed 3363.74 samples/sec Loss 0.8420 LearningRate 0.0003 Epoch: 18 Global Step: 234540 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:05:11,084-Speed 3279.59 samples/sec Loss 0.7938 LearningRate 0.0003 Epoch: 18 Global Step: 234550 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:05:14,174-Speed 3315.72 samples/sec Loss 0.8339 LearningRate 0.0003 Epoch: 18 Global Step: 234560 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:05:17,330-Speed 3244.65 samples/sec Loss 0.8163 LearningRate 0.0003 Epoch: 18 Global Step: 234570 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:05:20,483-Speed 3249.28 samples/sec Loss 0.8534 LearningRate 0.0003 Epoch: 18 Global Step: 234580 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:05:23,573-Speed 3314.00 samples/sec Loss 0.8176 LearningRate 0.0003 Epoch: 18 Global Step: 234590 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:05:26,702-Speed 3274.78 samples/sec Loss 0.8151 LearningRate 0.0003 Epoch: 18 Global Step: 234600 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:05:29,819-Speed 3285.93 samples/sec Loss 0.8189 LearningRate 0.0003 Epoch: 18 Global Step: 234610 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:05:32,951-Speed 3270.56 samples/sec Loss 0.8322 LearningRate 0.0003 Epoch: 18 Global Step: 234620 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:05:36,038-Speed 3318.46 samples/sec Loss 0.8279 LearningRate 0.0003 Epoch: 18 Global Step: 234630 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:05:39,150-Speed 3291.56 samples/sec Loss 0.8143 LearningRate 0.0003 Epoch: 18 Global Step: 234640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:05:42,320-Speed 3230.59 samples/sec Loss 0.8132 LearningRate 0.0003 Epoch: 18 Global Step: 234650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:05:45,402-Speed 3324.32 samples/sec Loss 0.8318 LearningRate 0.0003 Epoch: 18 Global Step: 234660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:05:48,632-Speed 3170.89 samples/sec Loss 0.7967 LearningRate 0.0003 Epoch: 18 Global Step: 234670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:05:51,722-Speed 3314.89 samples/sec Loss 0.8280 LearningRate 0.0003 Epoch: 18 Global Step: 234680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:05:54,811-Speed 3316.15 samples/sec Loss 0.7968 LearningRate 0.0003 Epoch: 18 Global Step: 234690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:05:57,891-Speed 3326.23 samples/sec Loss 0.8376 LearningRate 0.0003 Epoch: 18 Global Step: 234700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:01,005-Speed 3288.36 samples/sec Loss 0.8552 LearningRate 0.0003 Epoch: 18 Global Step: 234710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:04,207-Speed 3199.28 samples/sec Loss 0.8636 LearningRate 0.0003 Epoch: 18 Global Step: 234720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:07,327-Speed 3283.65 samples/sec Loss 0.8376 LearningRate 0.0003 Epoch: 18 Global Step: 234730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:10,450-Speed 3279.73 samples/sec Loss 0.8200 LearningRate 0.0003 Epoch: 18 Global Step: 234740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:06:13,536-Speed 3319.57 samples/sec Loss 0.8372 LearningRate 0.0003 Epoch: 18 Global Step: 234750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:06:16,654-Speed 3284.84 samples/sec Loss 0.8323 LearningRate 0.0003 Epoch: 18 Global Step: 234760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:06:19,748-Speed 3310.29 samples/sec Loss 0.8154 LearningRate 0.0003 Epoch: 18 Global Step: 234770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:06:22,829-Speed 3324.84 samples/sec Loss 0.8131 LearningRate 0.0003 Epoch: 18 Global Step: 234780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:25,939-Speed 3293.74 samples/sec Loss 0.8488 LearningRate 0.0003 Epoch: 18 Global Step: 234790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:29,056-Speed 3285.90 samples/sec Loss 0.7993 LearningRate 0.0003 Epoch: 18 Global Step: 234800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:32,157-Speed 3303.80 samples/sec Loss 0.8428 LearningRate 0.0003 Epoch: 18 Global Step: 234810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:35,282-Speed 3277.15 samples/sec Loss 0.8440 LearningRate 0.0003 Epoch: 18 Global Step: 234820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:38,388-Speed 3298.49 samples/sec Loss 0.8321 LearningRate 0.0003 Epoch: 18 Global Step: 234830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:41,525-Speed 3265.02 samples/sec Loss 0.8153 LearningRate 0.0003 Epoch: 18 Global Step: 234840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:44,654-Speed 3273.93 samples/sec Loss 0.8449 LearningRate 0.0003 Epoch: 18 Global Step: 234850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:47,852-Speed 3203.04 samples/sec Loss 0.8195 LearningRate 0.0003 Epoch: 18 Global Step: 234860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:51,027-Speed 3226.05 samples/sec Loss 0.8632 LearningRate 0.0003 Epoch: 18 Global Step: 234870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:06:54,134-Speed 3296.74 samples/sec Loss 0.7964 LearningRate 0.0003 Epoch: 18 Global Step: 234880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:06:57,193-Speed 3348.68 samples/sec Loss 0.8233 LearningRate 0.0003 Epoch: 18 Global Step: 234890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:00,396-Speed 3197.55 samples/sec Loss 0.8284 LearningRate 0.0003 Epoch: 18 Global Step: 234900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:03,509-Speed 3290.89 samples/sec Loss 0.8263 LearningRate 0.0003 Epoch: 18 Global Step: 234910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:06,647-Speed 3264.38 samples/sec Loss 0.7912 LearningRate 0.0003 Epoch: 18 Global Step: 234920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:09,719-Speed 3333.82 samples/sec Loss 0.8276 LearningRate 0.0003 Epoch: 18 Global Step: 234930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:12,877-Speed 3243.86 samples/sec Loss 0.8129 LearningRate 0.0003 Epoch: 18 Global Step: 234940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:15,981-Speed 3300.12 samples/sec Loss 0.8237 LearningRate 0.0003 Epoch: 18 Global Step: 234950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:19,069-Speed 3317.10 samples/sec Loss 0.8037 LearningRate 0.0003 Epoch: 18 Global Step: 234960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:22,136-Speed 3339.71 samples/sec Loss 0.8338 LearningRate 0.0003 Epoch: 18 Global Step: 234970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:25,267-Speed 3271.35 samples/sec Loss 0.8091 LearningRate 0.0003 Epoch: 18 Global Step: 234980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:28,386-Speed 3284.77 samples/sec Loss 0.8199 LearningRate 0.0003 Epoch: 18 Global Step: 234990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:31,470-Speed 3320.99 samples/sec Loss 0.8149 LearningRate 0.0003 Epoch: 18 Global Step: 235000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:34,575-Speed 3298.44 samples/sec Loss 0.8377 LearningRate 0.0003 Epoch: 18 Global Step: 235010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:37,733-Speed 3244.43 samples/sec Loss 0.7933 LearningRate 0.0003 Epoch: 18 Global Step: 235020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:40,850-Speed 3285.89 samples/sec Loss 0.8441 LearningRate 0.0003 Epoch: 18 Global Step: 235030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:43,971-Speed 3282.36 samples/sec Loss 0.8082 LearningRate 0.0003 Epoch: 18 Global Step: 235040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:47,089-Speed 3284.64 samples/sec Loss 0.8292 LearningRate 0.0003 Epoch: 18 Global Step: 235050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:50,180-Speed 3314.02 samples/sec Loss 0.8344 LearningRate 0.0003 Epoch: 18 Global Step: 235060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:53,364-Speed 3217.25 samples/sec Loss 0.7959 LearningRate 0.0003 Epoch: 18 Global Step: 235070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:56,459-Speed 3310.00 samples/sec Loss 0.7671 LearningRate 0.0003 Epoch: 18 Global Step: 235080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:07:59,556-Speed 3307.31 samples/sec Loss 0.8167 LearningRate 0.0003 Epoch: 18 Global Step: 235090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:08:02,646-Speed 3315.38 samples/sec Loss 0.8281 LearningRate 0.0003 Epoch: 18 Global Step: 235100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:08:05,729-Speed 3322.57 samples/sec Loss 0.8346 LearningRate 0.0003 Epoch: 18 Global Step: 235110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:08,848-Speed 3283.84 samples/sec Loss 0.8093 LearningRate 0.0003 Epoch: 18 Global Step: 235120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:11,975-Speed 3275.51 samples/sec Loss 0.8322 LearningRate 0.0003 Epoch: 18 Global Step: 235130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:15,103-Speed 3274.85 samples/sec Loss 0.8400 LearningRate 0.0003 Epoch: 18 Global Step: 235140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:18,201-Speed 3306.97 samples/sec Loss 0.7903 LearningRate 0.0003 Epoch: 18 Global Step: 235150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:21,272-Speed 3334.90 samples/sec Loss 0.8475 LearningRate 0.0003 Epoch: 18 Global Step: 235160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:24,355-Speed 3322.85 samples/sec Loss 0.8355 LearningRate 0.0003 Epoch: 18 Global Step: 235170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:27,442-Speed 3317.69 samples/sec Loss 0.8202 LearningRate 0.0003 Epoch: 18 Global Step: 235180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:30,621-Speed 3222.18 samples/sec Loss 0.8115 LearningRate 0.0003 Epoch: 18 Global Step: 235190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:33,700-Speed 3326.57 samples/sec Loss 0.8206 LearningRate 0.0003 Epoch: 18 Global Step: 235200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:36,788-Speed 3317.20 samples/sec Loss 0.8155 LearningRate 0.0003 Epoch: 18 Global Step: 235210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:08:39,957-Speed 3232.42 samples/sec Loss 0.8521 LearningRate 0.0003 Epoch: 18 Global Step: 235220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:08:43,022-Speed 3342.15 samples/sec Loss 0.8380 LearningRate 0.0003 Epoch: 18 Global Step: 235230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:46,075-Speed 3355.39 samples/sec Loss 0.8080 LearningRate 0.0003 Epoch: 18 Global Step: 235240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:49,160-Speed 3320.16 samples/sec Loss 0.7848 LearningRate 0.0003 Epoch: 18 Global Step: 235250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:52,263-Speed 3301.70 samples/sec Loss 0.8069 LearningRate 0.0003 Epoch: 18 Global Step: 235260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:55,339-Speed 3330.09 samples/sec Loss 0.8413 LearningRate 0.0003 Epoch: 18 Global Step: 235270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:08:58,434-Speed 3308.99 samples/sec Loss 0.7963 LearningRate 0.0003 Epoch: 18 Global Step: 235280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:09:01,605-Speed 3230.71 samples/sec Loss 0.7650 LearningRate 0.0003 Epoch: 18 Global Step: 235290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:09:04,799-Speed 3207.06 samples/sec Loss 0.8274 LearningRate 0.0003 Epoch: 18 Global Step: 235300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:09:07,891-Speed 3312.49 samples/sec Loss 0.8711 LearningRate 0.0003 Epoch: 18 Global Step: 235310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:09:10,992-Speed 3303.58 samples/sec Loss 0.8300 LearningRate 0.0003 Epoch: 18 Global Step: 235320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:09:14,093-Speed 3302.99 samples/sec Loss 0.8344 LearningRate 0.0003 Epoch: 18 Global Step: 235330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:09:17,251-Speed 3244.07 samples/sec Loss 0.8229 LearningRate 0.0003 Epoch: 18 Global Step: 235340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:09:20,321-Speed 3336.25 samples/sec Loss 0.8272 LearningRate 0.0003 Epoch: 18 Global Step: 235350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:09:23,459-Speed 3264.66 samples/sec Loss 0.7923 LearningRate 0.0003 Epoch: 18 Global Step: 235360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:09:26,523-Speed 3342.63 samples/sec Loss 0.7631 LearningRate 0.0003 Epoch: 18 Global Step: 235370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:09:29,587-Speed 3343.41 samples/sec Loss 0.8385 LearningRate 0.0003 Epoch: 18 Global Step: 235380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:09:32,642-Speed 3352.58 samples/sec Loss 0.8188 LearningRate 0.0003 Epoch: 18 Global Step: 235390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:09:35,732-Speed 3315.63 samples/sec Loss 0.8540 LearningRate 0.0003 Epoch: 18 Global Step: 235400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:09:38,832-Speed 3303.61 samples/sec Loss 0.8333 LearningRate 0.0003 Epoch: 18 Global Step: 235410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:09:41,916-Speed 3322.02 samples/sec Loss 0.7878 LearningRate 0.0003 Epoch: 18 Global Step: 235420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:09:44,963-Speed 3361.62 samples/sec Loss 0.8349 LearningRate 0.0003 Epoch: 18 Global Step: 235430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:09:48,028-Speed 3342.54 samples/sec Loss 0.8432 LearningRate 0.0003 Epoch: 18 Global Step: 235440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:09:51,120-Speed 3311.90 samples/sec Loss 0.8368 LearningRate 0.0003 Epoch: 18 Global Step: 235450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:09:54,187-Speed 3340.53 samples/sec Loss 0.8203 LearningRate 0.0003 Epoch: 18 Global Step: 235460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:09:57,265-Speed 3327.96 samples/sec Loss 0.8063 LearningRate 0.0003 Epoch: 18 Global Step: 235470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:10:00,332-Speed 3339.60 samples/sec Loss 0.8116 LearningRate 0.0003 Epoch: 18 Global Step: 235480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:10:03,444-Speed 3292.00 samples/sec Loss 0.8055 LearningRate 0.0003 Epoch: 18 Global Step: 235490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:10:06,502-Speed 3348.77 samples/sec Loss 0.8481 LearningRate 0.0003 Epoch: 18 Global Step: 235500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:10:09,593-Speed 3314.81 samples/sec Loss 0.8743 LearningRate 0.0003 Epoch: 18 Global Step: 235510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:10:12,746-Speed 3248.55 samples/sec Loss 0.8117 LearningRate 0.0003 Epoch: 18 Global Step: 235520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:10:15,850-Speed 3299.34 samples/sec Loss 0.8157 LearningRate 0.0003 Epoch: 18 Global Step: 235530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:10:18,927-Speed 3329.02 samples/sec Loss 0.8230 LearningRate 0.0003 Epoch: 18 Global Step: 235540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:10:21,988-Speed 3347.09 samples/sec Loss 0.8247 LearningRate 0.0003 Epoch: 18 Global Step: 235550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:10:25,073-Speed 3319.29 samples/sec Loss 0.8174 LearningRate 0.0003 Epoch: 18 Global Step: 235560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:10:28,172-Speed 3305.55 samples/sec Loss 0.8459 LearningRate 0.0003 Epoch: 18 Global Step: 235570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:10:31,314-Speed 3260.78 samples/sec Loss 0.8500 LearningRate 0.0003 Epoch: 18 Global Step: 235580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:10:34,403-Speed 3315.96 samples/sec Loss 0.8089 LearningRate 0.0003 Epoch: 18 Global Step: 235590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:10:37,542-Speed 3263.26 samples/sec Loss 0.8291 LearningRate 0.0003 Epoch: 18 Global Step: 235600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:10:40,595-Speed 3354.40 samples/sec Loss 0.8036 LearningRate 0.0003 Epoch: 18 Global Step: 235610 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:10:43,667-Speed 3334.48 samples/sec Loss 0.8129 LearningRate 0.0003 Epoch: 18 Global Step: 235620 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:10:46,791-Speed 3279.41 samples/sec Loss 0.8373 LearningRate 0.0003 Epoch: 18 Global Step: 235630 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:10:49,904-Speed 3290.19 samples/sec Loss 0.8036 LearningRate 0.0003 Epoch: 18 Global Step: 235640 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:10:52,962-Speed 3349.63 samples/sec Loss 0.8114 LearningRate 0.0003 Epoch: 18 Global Step: 235650 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:10:56,077-Speed 3287.78 samples/sec Loss 0.8084 LearningRate 0.0003 Epoch: 18 Global Step: 235660 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:10:59,167-Speed 3315.43 samples/sec Loss 0.8208 LearningRate 0.0003 Epoch: 18 Global Step: 235670 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:11:02,253-Speed 3319.52 samples/sec Loss 0.8105 LearningRate 0.0003 Epoch: 18 Global Step: 235680 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:11:05,321-Speed 3338.73 samples/sec Loss 0.8505 LearningRate 0.0003 Epoch: 18 Global Step: 235690 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:11:08,409-Speed 3317.18 samples/sec Loss 0.8085 LearningRate 0.0003 Epoch: 18 Global Step: 235700 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:11:11,482-Speed 3333.23 samples/sec Loss 0.8043 LearningRate 0.0003 Epoch: 18 Global Step: 235710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:14,542-Speed 3347.35 samples/sec Loss 0.8161 LearningRate 0.0003 Epoch: 18 Global Step: 235720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:17,627-Speed 3320.58 samples/sec Loss 0.8123 LearningRate 0.0003 Epoch: 18 Global Step: 235730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:20,701-Speed 3331.21 samples/sec Loss 0.8122 LearningRate 0.0003 Epoch: 18 Global Step: 235740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:23,768-Speed 3340.92 samples/sec Loss 0.8148 LearningRate 0.0003 Epoch: 18 Global Step: 235750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:26,906-Speed 3264.12 samples/sec Loss 0.8073 LearningRate 0.0003 Epoch: 18 Global Step: 235760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:29,975-Speed 3337.41 samples/sec Loss 0.7878 LearningRate 0.0003 Epoch: 18 Global Step: 235770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:33,040-Speed 3341.62 samples/sec Loss 0.8012 LearningRate 0.0003 Epoch: 18 Global Step: 235780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:36,145-Speed 3298.81 samples/sec Loss 0.8044 LearningRate 0.0003 Epoch: 18 Global Step: 235790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:39,270-Speed 3278.22 samples/sec Loss 0.8375 LearningRate 0.0003 Epoch: 18 Global Step: 235800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:42,397-Speed 3275.37 samples/sec Loss 0.8475 LearningRate 0.0003 Epoch: 18 Global Step: 235810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:11:45,477-Speed 3326.30 samples/sec Loss 0.8102 LearningRate 0.0003 Epoch: 18 Global Step: 235820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:48,615-Speed 3263.97 samples/sec Loss 0.8274 LearningRate 0.0003 Epoch: 18 Global Step: 235830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:51,731-Speed 3287.02 samples/sec Loss 0.8051 LearningRate 0.0003 Epoch: 18 Global Step: 235840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:54,881-Speed 3252.50 samples/sec Loss 0.8087 LearningRate 0.0003 Epoch: 18 Global Step: 235850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:11:57,939-Speed 3349.55 samples/sec Loss 0.8163 LearningRate 0.0003 Epoch: 18 Global Step: 235860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:12:01,005-Speed 3340.85 samples/sec Loss 0.8208 LearningRate 0.0003 Epoch: 18 Global Step: 235870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:12:04,128-Speed 3280.30 samples/sec Loss 0.8092 LearningRate 0.0003 Epoch: 18 Global Step: 235880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:12:07,268-Speed 3261.10 samples/sec Loss 0.8512 LearningRate 0.0003 Epoch: 18 Global Step: 235890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:12:10,359-Speed 3314.18 samples/sec Loss 0.8229 LearningRate 0.0003 Epoch: 18 Global Step: 235900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:12:13,496-Speed 3266.01 samples/sec Loss 0.8029 LearningRate 0.0003 Epoch: 18 Global Step: 235910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:12:16,642-Speed 3255.20 samples/sec Loss 0.8361 LearningRate 0.0003 Epoch: 18 Global Step: 235920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:12:19,775-Speed 3269.49 samples/sec Loss 0.8434 LearningRate 0.0003 Epoch: 18 Global Step: 235930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:12:22,924-Speed 3253.36 samples/sec Loss 0.7992 LearningRate 0.0003 Epoch: 18 Global Step: 235940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:12:26,111-Speed 3214.20 samples/sec Loss 0.8392 LearningRate 0.0003 Epoch: 18 Global Step: 235950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:12:29,241-Speed 3272.38 samples/sec Loss 0.8076 LearningRate 0.0003 Epoch: 18 Global Step: 235960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:12:32,409-Speed 3233.15 samples/sec Loss 0.8040 LearningRate 0.0003 Epoch: 18 Global Step: 235970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:12:35,479-Speed 3336.75 samples/sec Loss 0.8558 LearningRate 0.0003 Epoch: 18 Global Step: 235980 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:12:38,800-Speed 3084.41 samples/sec Loss 0.8287 LearningRate 0.0003 Epoch: 18 Global Step: 235990 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:13:10,211-Speed 326.01 samples/sec Loss 0.8139 LearningRate 0.0002 Epoch: 19 Global Step: 236000 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:13:13,636-Speed 2991.13 samples/sec Loss 0.7350 LearningRate 0.0002 Epoch: 19 Global Step: 236010 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:13:16,820-Speed 3217.04 samples/sec Loss 0.7127 LearningRate 0.0002 Epoch: 19 Global Step: 236020 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:13:19,883-Speed 3344.66 samples/sec Loss 0.7515 LearningRate 0.0002 Epoch: 19 Global Step: 236030 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:13:23,082-Speed 3201.51 samples/sec Loss 0.7396 LearningRate 0.0002 Epoch: 19 Global Step: 236040 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:13:26,204-Speed 3281.40 samples/sec Loss 0.6710 LearningRate 0.0002 Epoch: 19 Global Step: 236050 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:13:29,503-Speed 3105.48 samples/sec Loss 0.7264 LearningRate 0.0002 Epoch: 19 Global Step: 236060 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:13:32,599-Speed 3307.75 samples/sec Loss 0.7406 LearningRate 0.0002 Epoch: 19 Global Step: 236070 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:13:35,803-Speed 3197.47 samples/sec Loss 0.7408 LearningRate 0.0002 Epoch: 19 Global Step: 236080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:13:38,953-Speed 3251.99 samples/sec Loss 0.7333 LearningRate 0.0002 Epoch: 19 Global Step: 236090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:13:42,023-Speed 3337.01 samples/sec Loss 0.7422 LearningRate 0.0002 Epoch: 19 Global Step: 236100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:13:45,070-Speed 3361.29 samples/sec Loss 0.7146 LearningRate 0.0002 Epoch: 19 Global Step: 236110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:13:48,203-Speed 3270.00 samples/sec Loss 0.7255 LearningRate 0.0002 Epoch: 19 Global Step: 236120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:13:51,288-Speed 3320.39 samples/sec Loss 0.7151 LearningRate 0.0002 Epoch: 19 Global Step: 236130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:13:54,383-Speed 3309.68 samples/sec Loss 0.7116 LearningRate 0.0002 Epoch: 19 Global Step: 236140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:13:57,434-Speed 3357.40 samples/sec Loss 0.7465 LearningRate 0.0002 Epoch: 19 Global Step: 236150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:00,501-Speed 3339.30 samples/sec Loss 0.7493 LearningRate 0.0002 Epoch: 19 Global Step: 236160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:03,802-Speed 3102.78 samples/sec Loss 0.7215 LearningRate 0.0002 Epoch: 19 Global Step: 236170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:06,852-Speed 3359.41 samples/sec Loss 0.7327 LearningRate 0.0002 Epoch: 19 Global Step: 236180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:14:09,902-Speed 3357.77 samples/sec Loss 0.7294 LearningRate 0.0002 Epoch: 19 Global Step: 236190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:14:12,985-Speed 3322.64 samples/sec Loss 0.7350 LearningRate 0.0002 Epoch: 19 Global Step: 236200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:14:16,126-Speed 3261.45 samples/sec Loss 0.7216 LearningRate 0.0002 Epoch: 19 Global Step: 236210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:14:19,199-Speed 3332.94 samples/sec Loss 0.7484 LearningRate 0.0002 Epoch: 19 Global Step: 236220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:14:22,247-Speed 3360.78 samples/sec Loss 0.7693 LearningRate 0.0002 Epoch: 19 Global Step: 236230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:14:25,350-Speed 3300.99 samples/sec Loss 0.7371 LearningRate 0.0002 Epoch: 19 Global Step: 236240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:28,457-Speed 3296.55 samples/sec Loss 0.7602 LearningRate 0.0002 Epoch: 19 Global Step: 236250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:31,586-Speed 3274.03 samples/sec Loss 0.7157 LearningRate 0.0002 Epoch: 19 Global Step: 236260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:34,658-Speed 3333.47 samples/sec Loss 0.7372 LearningRate 0.0002 Epoch: 19 Global Step: 236270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:37,745-Speed 3318.82 samples/sec Loss 0.7198 LearningRate 0.0002 Epoch: 19 Global Step: 236280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:40,867-Speed 3281.31 samples/sec Loss 0.7353 LearningRate 0.0002 Epoch: 19 Global Step: 236290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:43,933-Speed 3340.82 samples/sec Loss 0.7212 LearningRate 0.0002 Epoch: 19 Global Step: 236300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:47,018-Speed 3320.37 samples/sec Loss 0.7619 LearningRate 0.0002 Epoch: 19 Global Step: 236310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:50,135-Speed 3286.04 samples/sec Loss 0.6882 LearningRate 0.0002 Epoch: 19 Global Step: 236320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:14:53,267-Speed 3270.64 samples/sec Loss 0.7741 LearningRate 0.0002 Epoch: 19 Global Step: 236330 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:14:56,346-Speed 3326.90 samples/sec Loss 0.7322 LearningRate 0.0002 Epoch: 19 Global Step: 236340 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:14:59,462-Speed 3287.24 samples/sec Loss 0.7324 LearningRate 0.0002 Epoch: 19 Global Step: 236350 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:15:02,664-Speed 3198.53 samples/sec Loss 0.7439 LearningRate 0.0002 Epoch: 19 Global Step: 236360 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:15:05,814-Speed 3252.28 samples/sec Loss 0.7486 LearningRate 0.0002 Epoch: 19 Global Step: 236370 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:15:08,869-Speed 3352.82 samples/sec Loss 0.7253 LearningRate 0.0002 Epoch: 19 Global Step: 236380 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:15:12,036-Speed 3234.48 samples/sec Loss 0.7016 LearningRate 0.0002 Epoch: 19 Global Step: 236390 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:15:15,243-Speed 3193.86 samples/sec Loss 0.7413 LearningRate 0.0002 Epoch: 19 Global Step: 236400 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:15:18,439-Speed 3205.23 samples/sec Loss 0.7478 LearningRate 0.0002 Epoch: 19 Global Step: 236410 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:15:21,496-Speed 3349.97 samples/sec Loss 0.7284 LearningRate 0.0002 Epoch: 19 Global Step: 236420 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:15:24,582-Speed 3319.48 samples/sec Loss 0.7040 LearningRate 0.0002 Epoch: 19 Global Step: 236430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:15:27,830-Speed 3153.33 samples/sec Loss 0.7471 LearningRate 0.0002 Epoch: 19 Global Step: 236440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:15:30,929-Speed 3306.01 samples/sec Loss 0.7761 LearningRate 0.0002 Epoch: 19 Global Step: 236450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:15:34,009-Speed 3325.21 samples/sec Loss 0.7203 LearningRate 0.0002 Epoch: 19 Global Step: 236460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:15:37,144-Speed 3267.72 samples/sec Loss 0.7124 LearningRate 0.0002 Epoch: 19 Global Step: 236470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:15:40,341-Speed 3203.58 samples/sec Loss 0.7411 LearningRate 0.0002 Epoch: 19 Global Step: 236480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:15:43,467-Speed 3277.38 samples/sec Loss 0.7517 LearningRate 0.0002 Epoch: 19 Global Step: 236490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:15:46,541-Speed 3331.73 samples/sec Loss 0.7057 LearningRate 0.0002 Epoch: 19 Global Step: 236500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:15:49,652-Speed 3292.28 samples/sec Loss 0.7217 LearningRate 0.0002 Epoch: 19 Global Step: 236510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:15:52,813-Speed 3241.03 samples/sec Loss 0.7272 LearningRate 0.0002 Epoch: 19 Global Step: 236520 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:15:55,969-Speed 3245.39 samples/sec Loss 0.6964 LearningRate 0.0002 Epoch: 19 Global Step: 236530 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:15:59,035-Speed 3340.85 samples/sec Loss 0.7535 LearningRate 0.0002 Epoch: 19 Global Step: 236540 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:16:02,161-Speed 3276.76 samples/sec Loss 0.6815 LearningRate 0.0002 Epoch: 19 Global Step: 236550 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:16:05,316-Speed 3246.58 samples/sec Loss 0.7517 LearningRate 0.0002 Epoch: 19 Global Step: 236560 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:16:08,405-Speed 3316.75 samples/sec Loss 0.7283 LearningRate 0.0002 Epoch: 19 Global Step: 236570 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:16:11,456-Speed 3356.81 samples/sec Loss 0.7570 LearningRate 0.0002 Epoch: 19 Global Step: 236580 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:16:14,620-Speed 3237.59 samples/sec Loss 0.6886 LearningRate 0.0002 Epoch: 19 Global Step: 236590 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:16:17,676-Speed 3352.28 samples/sec Loss 0.7479 LearningRate 0.0002 Epoch: 19 Global Step: 236600 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:16:20,725-Speed 3358.76 samples/sec Loss 0.7290 LearningRate 0.0002 Epoch: 19 Global Step: 236610 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:16:23,774-Speed 3360.32 samples/sec Loss 0.7122 LearningRate 0.0002 Epoch: 19 Global Step: 236620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:16:26,881-Speed 3297.21 samples/sec Loss 0.7328 LearningRate 0.0002 Epoch: 19 Global Step: 236630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:16:30,001-Speed 3282.23 samples/sec Loss 0.7247 LearningRate 0.0002 Epoch: 19 Global Step: 236640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:16:33,087-Speed 3320.38 samples/sec Loss 0.7615 LearningRate 0.0002 Epoch: 19 Global Step: 236650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:16:36,245-Speed 3243.06 samples/sec Loss 0.7409 LearningRate 0.0002 Epoch: 19 Global Step: 236660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:16:39,384-Speed 3262.99 samples/sec Loss 0.7406 LearningRate 0.0002 Epoch: 19 Global Step: 236670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:16:42,516-Speed 3270.05 samples/sec Loss 0.7492 LearningRate 0.0002 Epoch: 19 Global Step: 236680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:16:45,614-Speed 3306.31 samples/sec Loss 0.7399 LearningRate 0.0002 Epoch: 19 Global Step: 236690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:16:48,725-Speed 3292.67 samples/sec Loss 0.7053 LearningRate 0.0002 Epoch: 19 Global Step: 236700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:16:51,810-Speed 3320.46 samples/sec Loss 0.7384 LearningRate 0.0002 Epoch: 19 Global Step: 236710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:16:54,893-Speed 3322.60 samples/sec Loss 0.7349 LearningRate 0.0002 Epoch: 19 Global Step: 236720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:16:57,933-Speed 3370.30 samples/sec Loss 0.7116 LearningRate 0.0002 Epoch: 19 Global Step: 236730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:17:01,015-Speed 3323.02 samples/sec Loss 0.7139 LearningRate 0.0002 Epoch: 19 Global Step: 236740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:17:04,118-Speed 3300.58 samples/sec Loss 0.7480 LearningRate 0.0002 Epoch: 19 Global Step: 236750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:17:07,208-Speed 3314.76 samples/sec Loss 0.7061 LearningRate 0.0002 Epoch: 19 Global Step: 236760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:17:10,287-Speed 3327.67 samples/sec Loss 0.7267 LearningRate 0.0002 Epoch: 19 Global Step: 236770 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:17:13,356-Speed 3337.08 samples/sec Loss 0.7040 LearningRate 0.0002 Epoch: 19 Global Step: 236780 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:17:16,429-Speed 3333.99 samples/sec Loss 0.7732 LearningRate 0.0002 Epoch: 19 Global Step: 236790 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:17:19,579-Speed 3251.27 samples/sec Loss 0.7477 LearningRate 0.0002 Epoch: 19 Global Step: 236800 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:17:22,655-Speed 3331.17 samples/sec Loss 0.7426 LearningRate 0.0002 Epoch: 19 Global Step: 236810 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:17:25,730-Speed 3330.90 samples/sec Loss 0.7193 LearningRate 0.0002 Epoch: 19 Global Step: 236820 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:17:28,818-Speed 3317.32 samples/sec Loss 0.7383 LearningRate 0.0002 Epoch: 19 Global Step: 236830 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:17:31,886-Speed 3338.19 samples/sec Loss 0.7562 LearningRate 0.0002 Epoch: 19 Global Step: 236840 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:17:35,019-Speed 3269.46 samples/sec Loss 0.7061 LearningRate 0.0002 Epoch: 19 Global Step: 236850 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:17:38,142-Speed 3280.00 samples/sec Loss 0.7050 LearningRate 0.0002 Epoch: 19 Global Step: 236860 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:17:41,319-Speed 3223.91 samples/sec Loss 0.7246 LearningRate 0.0002 Epoch: 19 Global Step: 236870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:17:44,443-Speed 3279.48 samples/sec Loss 0.7254 LearningRate 0.0002 Epoch: 19 Global Step: 236880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:17:47,573-Speed 3272.12 samples/sec Loss 0.7089 LearningRate 0.0002 Epoch: 19 Global Step: 236890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:17:50,654-Speed 3324.47 samples/sec Loss 0.7245 LearningRate 0.0002 Epoch: 19 Global Step: 236900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:17:53,730-Speed 3330.85 samples/sec Loss 0.7836 LearningRate 0.0002 Epoch: 19 Global Step: 236910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:17:56,801-Speed 3335.46 samples/sec Loss 0.7447 LearningRate 0.0002 Epoch: 19 Global Step: 236920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:17:59,882-Speed 3323.76 samples/sec Loss 0.7246 LearningRate 0.0002 Epoch: 19 Global Step: 236930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:02,973-Speed 3315.31 samples/sec Loss 0.7429 LearningRate 0.0002 Epoch: 19 Global Step: 236940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:06,049-Speed 3329.93 samples/sec Loss 0.7143 LearningRate 0.0002 Epoch: 19 Global Step: 236950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:09,116-Speed 3339.52 samples/sec Loss 0.7266 LearningRate 0.0002 Epoch: 19 Global Step: 236960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:12,176-Speed 3348.57 samples/sec Loss 0.7353 LearningRate 0.0002 Epoch: 19 Global Step: 236970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:18:15,265-Speed 3315.86 samples/sec Loss 0.7310 LearningRate 0.0002 Epoch: 19 Global Step: 236980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:18:18,387-Speed 3280.75 samples/sec Loss 0.7255 LearningRate 0.0002 Epoch: 19 Global Step: 236990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:21,479-Speed 3312.96 samples/sec Loss 0.7537 LearningRate 0.0002 Epoch: 19 Global Step: 237000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:24,633-Speed 3247.84 samples/sec Loss 0.7412 LearningRate 0.0002 Epoch: 19 Global Step: 237010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:27,707-Speed 3331.63 samples/sec Loss 0.7153 LearningRate 0.0002 Epoch: 19 Global Step: 237020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:30,763-Speed 3351.47 samples/sec Loss 0.7107 LearningRate 0.0002 Epoch: 19 Global Step: 237030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:33,888-Speed 3278.06 samples/sec Loss 0.7368 LearningRate 0.0002 Epoch: 19 Global Step: 237040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:36,975-Speed 3318.29 samples/sec Loss 0.7446 LearningRate 0.0002 Epoch: 19 Global Step: 237050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:40,086-Speed 3292.84 samples/sec Loss 0.7132 LearningRate 0.0002 Epoch: 19 Global Step: 237060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:43,221-Speed 3267.54 samples/sec Loss 0.7299 LearningRate 0.0002 Epoch: 19 Global Step: 237070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:46,338-Speed 3286.08 samples/sec Loss 0.7140 LearningRate 0.0002 Epoch: 19 Global Step: 237080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:49,448-Speed 3294.02 samples/sec Loss 0.7063 LearningRate 0.0002 Epoch: 19 Global Step: 237090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:18:52,559-Speed 3293.02 samples/sec Loss 0.7398 LearningRate 0.0002 Epoch: 19 Global Step: 237100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:18:55,695-Speed 3265.85 samples/sec Loss 0.7209 LearningRate 0.0002 Epoch: 19 Global Step: 237110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:18:58,787-Speed 3312.18 samples/sec Loss 0.7047 LearningRate 0.0002 Epoch: 19 Global Step: 237120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:19:01,960-Speed 3229.12 samples/sec Loss 0.7574 LearningRate 0.0002 Epoch: 19 Global Step: 237130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:19:05,028-Speed 3338.50 samples/sec Loss 0.7544 LearningRate 0.0002 Epoch: 19 Global Step: 237140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:19:08,123-Speed 3309.96 samples/sec Loss 0.7399 LearningRate 0.0002 Epoch: 19 Global Step: 237150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:19:11,185-Speed 3344.89 samples/sec Loss 0.7223 LearningRate 0.0002 Epoch: 19 Global Step: 237160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:19:14,274-Speed 3316.01 samples/sec Loss 0.7385 LearningRate 0.0002 Epoch: 19 Global Step: 237170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:19:17,370-Speed 3308.66 samples/sec Loss 0.7559 LearningRate 0.0002 Epoch: 19 Global Step: 237180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:19:20,443-Speed 3333.43 samples/sec Loss 0.7443 LearningRate 0.0002 Epoch: 19 Global Step: 237190 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:19:23,558-Speed 3288.41 samples/sec Loss 0.7478 LearningRate 0.0002 Epoch: 19 Global Step: 237200 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:19:26,740-Speed 3219.65 samples/sec Loss 0.7466 LearningRate 0.0002 Epoch: 19 Global Step: 237210 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:19:29,865-Speed 3277.35 samples/sec Loss 0.7256 LearningRate 0.0002 Epoch: 19 Global Step: 237220 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:19:32,949-Speed 3321.80 samples/sec Loss 0.7552 LearningRate 0.0002 Epoch: 19 Global Step: 237230 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:19:36,073-Speed 3278.66 samples/sec Loss 0.7292 LearningRate 0.0002 Epoch: 19 Global Step: 237240 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:19:39,135-Speed 3345.11 samples/sec Loss 0.7626 LearningRate 0.0002 Epoch: 19 Global Step: 237250 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:19:42,204-Speed 3337.68 samples/sec Loss 0.7157 LearningRate 0.0002 Epoch: 19 Global Step: 237260 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:19:45,281-Speed 3328.96 samples/sec Loss 0.7349 LearningRate 0.0002 Epoch: 19 Global Step: 237270 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:19:48,407-Speed 3277.25 samples/sec Loss 0.7277 LearningRate 0.0002 Epoch: 19 Global Step: 237280 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:19:51,538-Speed 3271.26 samples/sec Loss 0.7115 LearningRate 0.0002 Epoch: 19 Global Step: 237290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:19:54,677-Speed 3262.43 samples/sec Loss 0.7353 LearningRate 0.0002 Epoch: 19 Global Step: 237300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:19:57,730-Speed 3355.63 samples/sec Loss 0.7054 LearningRate 0.0002 Epoch: 19 Global Step: 237310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:20:00,824-Speed 3311.28 samples/sec Loss 0.7290 LearningRate 0.0002 Epoch: 19 Global Step: 237320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:20:03,966-Speed 3260.27 samples/sec Loss 0.7204 LearningRate 0.0002 Epoch: 19 Global Step: 237330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:20:07,021-Speed 3354.14 samples/sec Loss 0.7163 LearningRate 0.0002 Epoch: 19 Global Step: 237340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:20:10,105-Speed 3321.29 samples/sec Loss 0.7460 LearningRate 0.0002 Epoch: 19 Global Step: 237350 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:20:13,229-Speed 3278.75 samples/sec Loss 0.7115 LearningRate 0.0002 Epoch: 19 Global Step: 237360 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:20:16,356-Speed 3276.08 samples/sec Loss 0.7608 LearningRate 0.0002 Epoch: 19 Global Step: 237370 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:20:19,413-Speed 3350.65 samples/sec Loss 0.7204 LearningRate 0.0002 Epoch: 19 Global Step: 237380 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:20:22,489-Speed 3330.55 samples/sec Loss 0.7273 LearningRate 0.0002 Epoch: 19 Global Step: 237390 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:20:25,568-Speed 3325.92 samples/sec Loss 0.7086 LearningRate 0.0002 Epoch: 19 Global Step: 237400 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:20:28,645-Speed 3329.82 samples/sec Loss 0.7520 LearningRate 0.0002 Epoch: 19 Global Step: 237410 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:20:31,794-Speed 3253.03 samples/sec Loss 0.7340 LearningRate 0.0002 Epoch: 19 Global Step: 237420 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:20:34,919-Speed 3277.76 samples/sec Loss 0.7222 LearningRate 0.0002 Epoch: 19 Global Step: 237430 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:20:38,020-Speed 3303.35 samples/sec Loss 0.7463 LearningRate 0.0002 Epoch: 19 Global Step: 237440 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:20:41,125-Speed 3299.06 samples/sec Loss 0.7346 LearningRate 0.0002 Epoch: 19 Global Step: 237450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:20:44,187-Speed 3345.03 samples/sec Loss 0.7063 LearningRate 0.0002 Epoch: 19 Global Step: 237460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:20:47,243-Speed 3351.11 samples/sec Loss 0.7310 LearningRate 0.0002 Epoch: 19 Global Step: 237470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:20:50,328-Speed 3320.96 samples/sec Loss 0.7145 LearningRate 0.0002 Epoch: 19 Global Step: 237480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:20:53,479-Speed 3250.30 samples/sec Loss 0.7378 LearningRate 0.0002 Epoch: 19 Global Step: 237490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:20:56,557-Speed 3328.47 samples/sec Loss 0.7326 LearningRate 0.0002 Epoch: 19 Global Step: 237500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:20:59,652-Speed 3309.64 samples/sec Loss 0.6945 LearningRate 0.0002 Epoch: 19 Global Step: 237510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:02,755-Speed 3300.42 samples/sec Loss 0.7041 LearningRate 0.0002 Epoch: 19 Global Step: 237520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:05,897-Speed 3260.34 samples/sec Loss 0.6988 LearningRate 0.0002 Epoch: 19 Global Step: 237530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:09,032-Speed 3267.47 samples/sec Loss 0.7431 LearningRate 0.0002 Epoch: 19 Global Step: 237540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:12,123-Speed 3313.95 samples/sec Loss 0.7486 LearningRate 0.0002 Epoch: 19 Global Step: 237550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:15,268-Speed 3257.00 samples/sec Loss 0.7188 LearningRate 0.0002 Epoch: 19 Global Step: 237560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:18,421-Speed 3248.53 samples/sec Loss 0.7346 LearningRate 0.0002 Epoch: 19 Global Step: 237570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:21,493-Speed 3335.32 samples/sec Loss 0.7408 LearningRate 0.0002 Epoch: 19 Global Step: 237580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:24,588-Speed 3308.64 samples/sec Loss 0.7090 LearningRate 0.0002 Epoch: 19 Global Step: 237590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:27,666-Speed 3328.43 samples/sec Loss 0.7085 LearningRate 0.0002 Epoch: 19 Global Step: 237600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:30,740-Speed 3332.46 samples/sec Loss 0.7220 LearningRate 0.0002 Epoch: 19 Global Step: 237610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:33,807-Speed 3340.08 samples/sec Loss 0.7523 LearningRate 0.0002 Epoch: 19 Global Step: 237620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:36,884-Speed 3328.61 samples/sec Loss 0.7211 LearningRate 0.0002 Epoch: 19 Global Step: 237630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:39,959-Speed 3332.34 samples/sec Loss 0.7591 LearningRate 0.0002 Epoch: 19 Global Step: 237640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:43,082-Speed 3279.70 samples/sec Loss 0.7527 LearningRate 0.0002 Epoch: 19 Global Step: 237650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:21:46,143-Speed 3346.78 samples/sec Loss 0.7589 LearningRate 0.0002 Epoch: 19 Global Step: 237660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:49,258-Speed 3287.73 samples/sec Loss 0.7521 LearningRate 0.0002 Epoch: 19 Global Step: 237670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:52,370-Speed 3291.68 samples/sec Loss 0.7240 LearningRate 0.0002 Epoch: 19 Global Step: 237680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:21:55,442-Speed 3334.85 samples/sec Loss 0.7289 LearningRate 0.0002 Epoch: 19 Global Step: 237690 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:21:58,494-Speed 3356.88 samples/sec Loss 0.7426 LearningRate 0.0002 Epoch: 19 Global Step: 237700 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:01,612-Speed 3284.03 samples/sec Loss 0.7692 LearningRate 0.0002 Epoch: 19 Global Step: 237710 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:04,728-Speed 3287.67 samples/sec Loss 0.7220 LearningRate 0.0002 Epoch: 19 Global Step: 237720 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:07,868-Speed 3262.81 samples/sec Loss 0.7204 LearningRate 0.0002 Epoch: 19 Global Step: 237730 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:10,945-Speed 3328.14 samples/sec Loss 0.7549 LearningRate 0.0002 Epoch: 19 Global Step: 237740 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:14,113-Speed 3233.65 samples/sec Loss 0.7416 LearningRate 0.0002 Epoch: 19 Global Step: 237750 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:17,260-Speed 3255.49 samples/sec Loss 0.7343 LearningRate 0.0002 Epoch: 19 Global Step: 237760 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:20,354-Speed 3310.54 samples/sec Loss 0.7401 LearningRate 0.0002 Epoch: 19 Global Step: 237770 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:23,456-Speed 3301.59 samples/sec Loss 0.7115 LearningRate 0.0002 Epoch: 19 Global Step: 237780 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:26,613-Speed 3244.53 samples/sec Loss 0.7179 LearningRate 0.0002 Epoch: 19 Global Step: 237790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:22:29,737-Speed 3279.37 samples/sec Loss 0.7232 LearningRate 0.0002 Epoch: 19 Global Step: 237800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:22:32,861-Speed 3278.52 samples/sec Loss 0.7403 LearningRate 0.0002 Epoch: 19 Global Step: 237810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:22:35,981-Speed 3283.75 samples/sec Loss 0.7319 LearningRate 0.0002 Epoch: 19 Global Step: 237820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:22:39,120-Speed 3262.86 samples/sec Loss 0.6903 LearningRate 0.0002 Epoch: 19 Global Step: 237830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:22:42,206-Speed 3319.22 samples/sec Loss 0.7184 LearningRate 0.0002 Epoch: 19 Global Step: 237840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:22:45,328-Speed 3280.94 samples/sec Loss 0.7092 LearningRate 0.0002 Epoch: 19 Global Step: 237850 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:48,534-Speed 3194.90 samples/sec Loss 0.6940 LearningRate 0.0002 Epoch: 19 Global Step: 237860 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:51,698-Speed 3238.39 samples/sec Loss 0.7133 LearningRate 0.0002 Epoch: 19 Global Step: 237870 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:54,916-Speed 3182.44 samples/sec Loss 0.7114 LearningRate 0.0002 Epoch: 19 Global Step: 237880 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:22:58,152-Speed 3165.10 samples/sec Loss 0.7589 LearningRate 0.0002 Epoch: 19 Global Step: 237890 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:01,333-Speed 3220.39 samples/sec Loss 0.7437 LearningRate 0.0002 Epoch: 19 Global Step: 237900 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:04,471-Speed 3264.70 samples/sec Loss 0.7103 LearningRate 0.0002 Epoch: 19 Global Step: 237910 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:07,579-Speed 3295.64 samples/sec Loss 0.7608 LearningRate 0.0002 Epoch: 19 Global Step: 237920 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:10,631-Speed 3356.11 samples/sec Loss 0.7432 LearningRate 0.0002 Epoch: 19 Global Step: 237930 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:13,759-Speed 3275.21 samples/sec Loss 0.7277 LearningRate 0.0002 Epoch: 19 Global Step: 237940 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:16,856-Speed 3307.19 samples/sec Loss 0.7638 LearningRate 0.0002 Epoch: 19 Global Step: 237950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:23:20,004-Speed 3254.25 samples/sec Loss 0.7409 LearningRate 0.0002 Epoch: 19 Global Step: 237960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:23:23,104-Speed 3304.76 samples/sec Loss 0.7403 LearningRate 0.0002 Epoch: 19 Global Step: 237970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:23:26,240-Speed 3265.68 samples/sec Loss 0.7152 LearningRate 0.0002 Epoch: 19 Global Step: 237980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:23:29,349-Speed 3295.61 samples/sec Loss 0.7327 LearningRate 0.0002 Epoch: 19 Global Step: 237990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:23:32,419-Speed 3336.43 samples/sec Loss 0.7199 LearningRate 0.0002 Epoch: 19 Global Step: 238000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:23:35,541-Speed 3280.67 samples/sec Loss 0.7360 LearningRate 0.0002 Epoch: 19 Global Step: 238010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:23:38,663-Speed 3281.13 samples/sec Loss 0.7327 LearningRate 0.0002 Epoch: 19 Global Step: 238020 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:41,819-Speed 3245.33 samples/sec Loss 0.7060 LearningRate 0.0002 Epoch: 19 Global Step: 238030 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:44,923-Speed 3300.43 samples/sec Loss 0.7558 LearningRate 0.0002 Epoch: 19 Global Step: 238040 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:48,089-Speed 3236.21 samples/sec Loss 0.7367 LearningRate 0.0002 Epoch: 19 Global Step: 238050 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:51,244-Speed 3245.97 samples/sec Loss 0.7246 LearningRate 0.0002 Epoch: 19 Global Step: 238060 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:54,394-Speed 3252.55 samples/sec Loss 0.7540 LearningRate 0.0002 Epoch: 19 Global Step: 238070 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:23:57,490-Speed 3308.40 samples/sec Loss 0.7195 LearningRate 0.0002 Epoch: 19 Global Step: 238080 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:24:00,620-Speed 3272.54 samples/sec Loss 0.7452 LearningRate 0.0002 Epoch: 19 Global Step: 238090 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:24:03,716-Speed 3309.19 samples/sec Loss 0.7611 LearningRate 0.0002 Epoch: 19 Global Step: 238100 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:24:06,840-Speed 3278.71 samples/sec Loss 0.7102 LearningRate 0.0002 Epoch: 19 Global Step: 238110 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:24:09,925-Speed 3320.14 samples/sec Loss 0.7459 LearningRate 0.0002 Epoch: 19 Global Step: 238120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:13,054-Speed 3273.36 samples/sec Loss 0.7366 LearningRate 0.0002 Epoch: 19 Global Step: 238130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:16,172-Speed 3285.21 samples/sec Loss 0.7478 LearningRate 0.0002 Epoch: 19 Global Step: 238140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:19,276-Speed 3299.91 samples/sec Loss 0.7205 LearningRate 0.0002 Epoch: 19 Global Step: 238150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:22,378-Speed 3302.84 samples/sec Loss 0.7260 LearningRate 0.0002 Epoch: 19 Global Step: 238160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:25,445-Speed 3339.87 samples/sec Loss 0.7290 LearningRate 0.0002 Epoch: 19 Global Step: 238170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:28,534-Speed 3315.25 samples/sec Loss 0.7152 LearningRate 0.0002 Epoch: 19 Global Step: 238180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:31,624-Speed 3315.30 samples/sec Loss 0.7681 LearningRate 0.0002 Epoch: 19 Global Step: 238190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:34,762-Speed 3264.04 samples/sec Loss 0.7151 LearningRate 0.0002 Epoch: 19 Global Step: 238200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:37,876-Speed 3289.49 samples/sec Loss 0.7265 LearningRate 0.0002 Epoch: 19 Global Step: 238210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:40,968-Speed 3313.28 samples/sec Loss 0.7378 LearningRate 0.0002 Epoch: 19 Global Step: 238220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:24:44,102-Speed 3268.19 samples/sec Loss 0.7327 LearningRate 0.0002 Epoch: 19 Global Step: 238230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:24:47,236-Speed 3268.85 samples/sec Loss 0.7446 LearningRate 0.0002 Epoch: 19 Global Step: 238240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:50,386-Speed 3251.49 samples/sec Loss 0.7395 LearningRate 0.0002 Epoch: 19 Global Step: 238250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:53,472-Speed 3319.32 samples/sec Loss 0.7295 LearningRate 0.0002 Epoch: 19 Global Step: 238260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:56,614-Speed 3260.41 samples/sec Loss 0.7065 LearningRate 0.0002 Epoch: 19 Global Step: 238270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:24:59,715-Speed 3303.70 samples/sec Loss 0.7202 LearningRate 0.0002 Epoch: 19 Global Step: 238280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:25:02,795-Speed 3326.05 samples/sec Loss 0.7315 LearningRate 0.0002 Epoch: 19 Global Step: 238290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:25:05,953-Speed 3242.76 samples/sec Loss 0.7520 LearningRate 0.0002 Epoch: 19 Global Step: 238300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:25:09,064-Speed 3292.87 samples/sec Loss 0.7320 LearningRate 0.0002 Epoch: 19 Global Step: 238310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:25:12,159-Speed 3310.51 samples/sec Loss 0.7501 LearningRate 0.0002 Epoch: 19 Global Step: 238320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:25:15,312-Speed 3248.35 samples/sec Loss 0.7282 LearningRate 0.0002 Epoch: 19 Global Step: 238330 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:25:18,473-Speed 3240.97 samples/sec Loss 0.7135 LearningRate 0.0002 Epoch: 19 Global Step: 238340 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:25:21,560-Speed 3317.19 samples/sec Loss 0.7152 LearningRate 0.0002 Epoch: 19 Global Step: 238350 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:25:24,758-Speed 3203.11 samples/sec Loss 0.7176 LearningRate 0.0002 Epoch: 19 Global Step: 238360 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:25:27,943-Speed 3216.84 samples/sec Loss 0.6763 LearningRate 0.0002 Epoch: 19 Global Step: 238370 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:25:31,054-Speed 3291.92 samples/sec Loss 0.7246 LearningRate 0.0002 Epoch: 19 Global Step: 238380 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:25:34,130-Speed 3329.72 samples/sec Loss 0.7126 LearningRate 0.0002 Epoch: 19 Global Step: 238390 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:25:37,240-Speed 3294.31 samples/sec Loss 0.7334 LearningRate 0.0002 Epoch: 19 Global Step: 238400 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:25:40,392-Speed 3249.54 samples/sec Loss 0.7609 LearningRate 0.0002 Epoch: 19 Global Step: 238410 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:25:43,550-Speed 3243.25 samples/sec Loss 0.7411 LearningRate 0.0002 Epoch: 19 Global Step: 238420 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:25:46,691-Speed 3261.06 samples/sec Loss 0.7422 LearningRate 0.0002 Epoch: 19 Global Step: 238430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:25:49,891-Speed 3201.37 samples/sec Loss 0.6898 LearningRate 0.0002 Epoch: 19 Global Step: 238440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:25:53,018-Speed 3276.11 samples/sec Loss 0.7426 LearningRate 0.0002 Epoch: 19 Global Step: 238450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:25:56,140-Speed 3281.42 samples/sec Loss 0.7192 LearningRate 0.0002 Epoch: 19 Global Step: 238460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:25:59,308-Speed 3232.94 samples/sec Loss 0.7005 LearningRate 0.0002 Epoch: 19 Global Step: 238470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:02,404-Speed 3308.43 samples/sec Loss 0.7360 LearningRate 0.0002 Epoch: 19 Global Step: 238480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:05,525-Speed 3281.63 samples/sec Loss 0.7817 LearningRate 0.0002 Epoch: 19 Global Step: 238490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:08,640-Speed 3288.60 samples/sec Loss 0.7266 LearningRate 0.0002 Epoch: 19 Global Step: 238500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:11,821-Speed 3219.69 samples/sec Loss 0.7257 LearningRate 0.0002 Epoch: 19 Global Step: 238510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:14,927-Speed 3297.79 samples/sec Loss 0.7160 LearningRate 0.0002 Epoch: 19 Global Step: 238520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:18,065-Speed 3265.00 samples/sec Loss 0.7527 LearningRate 0.0002 Epoch: 19 Global Step: 238530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:21,141-Speed 3330.73 samples/sec Loss 0.7381 LearningRate 0.0002 Epoch: 19 Global Step: 238540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:24,260-Speed 3283.55 samples/sec Loss 0.7366 LearningRate 0.0002 Epoch: 19 Global Step: 238550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:27,380-Speed 3282.93 samples/sec Loss 0.7157 LearningRate 0.0002 Epoch: 19 Global Step: 238560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:30,472-Speed 3313.64 samples/sec Loss 0.7176 LearningRate 0.0002 Epoch: 19 Global Step: 238570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:33,538-Speed 3341.01 samples/sec Loss 0.7060 LearningRate 0.0002 Epoch: 19 Global Step: 238580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:36,669-Speed 3271.32 samples/sec Loss 0.6786 LearningRate 0.0002 Epoch: 19 Global Step: 238590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:39,792-Speed 3280.36 samples/sec Loss 0.7301 LearningRate 0.0002 Epoch: 19 Global Step: 238600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:42,914-Speed 3280.07 samples/sec Loss 0.7000 LearningRate 0.0002 Epoch: 19 Global Step: 238610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:46,022-Speed 3295.63 samples/sec Loss 0.7440 LearningRate 0.0002 Epoch: 19 Global Step: 238620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:49,110-Speed 3318.02 samples/sec Loss 0.7434 LearningRate 0.0002 Epoch: 19 Global Step: 238630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:26:52,213-Speed 3301.05 samples/sec Loss 0.7110 LearningRate 0.0002 Epoch: 19 Global Step: 238640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:55,357-Speed 3258.10 samples/sec Loss 0.7338 LearningRate 0.0002 Epoch: 19 Global Step: 238650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:26:58,424-Speed 3340.11 samples/sec Loss 0.7365 LearningRate 0.0002 Epoch: 19 Global Step: 238660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:01,516-Speed 3312.50 samples/sec Loss 0.7325 LearningRate 0.0002 Epoch: 19 Global Step: 238670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:04,662-Speed 3256.16 samples/sec Loss 0.7878 LearningRate 0.0002 Epoch: 19 Global Step: 238680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:07,744-Speed 3323.02 samples/sec Loss 0.7432 LearningRate 0.0002 Epoch: 19 Global Step: 238690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:10,805-Speed 3348.16 samples/sec Loss 0.7597 LearningRate 0.0002 Epoch: 19 Global Step: 238700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:13,989-Speed 3217.18 samples/sec Loss 0.7368 LearningRate 0.0002 Epoch: 19 Global Step: 238710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:17,125-Speed 3266.24 samples/sec Loss 0.7187 LearningRate 0.0002 Epoch: 19 Global Step: 238720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:20,207-Speed 3322.92 samples/sec Loss 0.7196 LearningRate 0.0002 Epoch: 19 Global Step: 238730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:23,282-Speed 3331.20 samples/sec Loss 0.7475 LearningRate 0.0002 Epoch: 19 Global Step: 238740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:27:26,437-Speed 3246.47 samples/sec Loss 0.7151 LearningRate 0.0002 Epoch: 19 Global Step: 238750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:29,501-Speed 3343.53 samples/sec Loss 0.7215 LearningRate 0.0002 Epoch: 19 Global Step: 238760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:32,661-Speed 3241.64 samples/sec Loss 0.7303 LearningRate 0.0002 Epoch: 19 Global Step: 238770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:35,820-Speed 3242.30 samples/sec Loss 0.7440 LearningRate 0.0002 Epoch: 19 Global Step: 238780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:38,945-Speed 3277.50 samples/sec Loss 0.7187 LearningRate 0.0002 Epoch: 19 Global Step: 238790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:42,091-Speed 3256.77 samples/sec Loss 0.7291 LearningRate 0.0001 Epoch: 19 Global Step: 238800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:45,214-Speed 3279.46 samples/sec Loss 0.7406 LearningRate 0.0001 Epoch: 19 Global Step: 238810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:27:48,274-Speed 3346.79 samples/sec Loss 0.7572 LearningRate 0.0001 Epoch: 19 Global Step: 238820 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:27:51,341-Speed 3340.28 samples/sec Loss 0.7174 LearningRate 0.0001 Epoch: 19 Global Step: 238830 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:27:54,474-Speed 3269.47 samples/sec Loss 0.7255 LearningRate 0.0001 Epoch: 19 Global Step: 238840 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:27:57,557-Speed 3322.50 samples/sec Loss 0.7265 LearningRate 0.0001 Epoch: 19 Global Step: 238850 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:28:00,669-Speed 3291.15 samples/sec Loss 0.7099 LearningRate 0.0001 Epoch: 19 Global Step: 238860 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:28:03,793-Speed 3279.21 samples/sec Loss 0.7319 LearningRate 0.0001 Epoch: 19 Global Step: 238870 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:28:07,045-Speed 3150.16 samples/sec Loss 0.6923 LearningRate 0.0001 Epoch: 19 Global Step: 238880 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:28:10,115-Speed 3336.39 samples/sec Loss 0.7150 LearningRate 0.0001 Epoch: 19 Global Step: 238890 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:28:13,262-Speed 3255.06 samples/sec Loss 0.7149 LearningRate 0.0001 Epoch: 19 Global Step: 238900 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:28:16,406-Speed 3258.22 samples/sec Loss 0.7230 LearningRate 0.0001 Epoch: 19 Global Step: 238910 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:28:19,480-Speed 3332.09 samples/sec Loss 0.7316 LearningRate 0.0001 Epoch: 19 Global Step: 238920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:28:22,527-Speed 3361.61 samples/sec Loss 0.7216 LearningRate 0.0001 Epoch: 19 Global Step: 238930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:28:25,621-Speed 3310.94 samples/sec Loss 0.7311 LearningRate 0.0001 Epoch: 19 Global Step: 238940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:28:28,694-Speed 3333.70 samples/sec Loss 0.6990 LearningRate 0.0001 Epoch: 19 Global Step: 238950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:28:31,801-Speed 3296.50 samples/sec Loss 0.7229 LearningRate 0.0001 Epoch: 19 Global Step: 238960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:28:34,918-Speed 3286.64 samples/sec Loss 0.7330 LearningRate 0.0001 Epoch: 19 Global Step: 238970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:28:38,010-Speed 3313.10 samples/sec Loss 0.6976 LearningRate 0.0001 Epoch: 19 Global Step: 238980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:28:41,197-Speed 3213.47 samples/sec Loss 0.7090 LearningRate 0.0001 Epoch: 19 Global Step: 238990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:28:44,294-Speed 3307.16 samples/sec Loss 0.7145 LearningRate 0.0001 Epoch: 19 Global Step: 239000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:28:47,394-Speed 3304.93 samples/sec Loss 0.7504 LearningRate 0.0001 Epoch: 19 Global Step: 239010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:28:50,480-Speed 3318.51 samples/sec Loss 0.7146 LearningRate 0.0001 Epoch: 19 Global Step: 239020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:28:53,607-Speed 3276.17 samples/sec Loss 0.7204 LearningRate 0.0001 Epoch: 19 Global Step: 239030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:28:56,671-Speed 3343.39 samples/sec Loss 0.7630 LearningRate 0.0001 Epoch: 19 Global Step: 239040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:28:59,777-Speed 3297.95 samples/sec Loss 0.7449 LearningRate 0.0001 Epoch: 19 Global Step: 239050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:29:02,984-Speed 3194.04 samples/sec Loss 0.7490 LearningRate 0.0001 Epoch: 19 Global Step: 239060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:29:06,115-Speed 3271.54 samples/sec Loss 0.7346 LearningRate 0.0001 Epoch: 19 Global Step: 239070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:29:09,239-Speed 3279.03 samples/sec Loss 0.7411 LearningRate 0.0001 Epoch: 19 Global Step: 239080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:29:12,375-Speed 3266.09 samples/sec Loss 0.7512 LearningRate 0.0001 Epoch: 19 Global Step: 239090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:29:15,546-Speed 3229.89 samples/sec Loss 0.7171 LearningRate 0.0001 Epoch: 19 Global Step: 239100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:29:18,662-Speed 3287.38 samples/sec Loss 0.7004 LearningRate 0.0001 Epoch: 19 Global Step: 239110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:29:21,733-Speed 3335.26 samples/sec Loss 0.7318 LearningRate 0.0001 Epoch: 19 Global Step: 239120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:29:24,797-Speed 3343.61 samples/sec Loss 0.6965 LearningRate 0.0001 Epoch: 19 Global Step: 239130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:29:27,939-Speed 3260.16 samples/sec Loss 0.7347 LearningRate 0.0001 Epoch: 19 Global Step: 239140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:29:31,071-Speed 3270.18 samples/sec Loss 0.7195 LearningRate 0.0001 Epoch: 19 Global Step: 239150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:29:34,147-Speed 3330.14 samples/sec Loss 0.7254 LearningRate 0.0001 Epoch: 19 Global Step: 239160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:29:37,231-Speed 3322.10 samples/sec Loss 0.7196 LearningRate 0.0001 Epoch: 19 Global Step: 239170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:29:40,326-Speed 3309.24 samples/sec Loss 0.7326 LearningRate 0.0001 Epoch: 19 Global Step: 239180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:29:43,526-Speed 3201.26 samples/sec Loss 0.7473 LearningRate 0.0001 Epoch: 19 Global Step: 239190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:29:46,651-Speed 3276.88 samples/sec Loss 0.7240 LearningRate 0.0001 Epoch: 19 Global Step: 239200 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:29:49,733-Speed 3324.52 samples/sec Loss 0.7638 LearningRate 0.0001 Epoch: 19 Global Step: 239210 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:29:52,843-Speed 3293.27 samples/sec Loss 0.7177 LearningRate 0.0001 Epoch: 19 Global Step: 239220 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:29:55,938-Speed 3309.14 samples/sec Loss 0.7093 LearningRate 0.0001 Epoch: 19 Global Step: 239230 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:29:59,012-Speed 3332.85 samples/sec Loss 0.7222 LearningRate 0.0001 Epoch: 19 Global Step: 239240 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:02,165-Speed 3248.73 samples/sec Loss 0.7356 LearningRate 0.0001 Epoch: 19 Global Step: 239250 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:05,257-Speed 3313.35 samples/sec Loss 0.7397 LearningRate 0.0001 Epoch: 19 Global Step: 239260 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:08,376-Speed 3283.03 samples/sec Loss 0.6859 LearningRate 0.0001 Epoch: 19 Global Step: 239270 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:11,533-Speed 3244.71 samples/sec Loss 0.7249 LearningRate 0.0001 Epoch: 19 Global Step: 239280 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:14,663-Speed 3273.18 samples/sec Loss 0.7228 LearningRate 0.0001 Epoch: 19 Global Step: 239290 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:17,775-Speed 3291.79 samples/sec Loss 0.7268 LearningRate 0.0001 Epoch: 19 Global Step: 239300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:30:20,842-Speed 3339.26 samples/sec Loss 0.7515 LearningRate 0.0001 Epoch: 19 Global Step: 239310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:30:23,912-Speed 3336.52 samples/sec Loss 0.7677 LearningRate 0.0001 Epoch: 19 Global Step: 239320 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:27,075-Speed 3238.48 samples/sec Loss 0.7561 LearningRate 0.0001 Epoch: 19 Global Step: 239330 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:30,201-Speed 3276.15 samples/sec Loss 0.7523 LearningRate 0.0001 Epoch: 19 Global Step: 239340 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:33,337-Speed 3266.72 samples/sec Loss 0.7540 LearningRate 0.0001 Epoch: 19 Global Step: 239350 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:36,437-Speed 3304.51 samples/sec Loss 0.7353 LearningRate 0.0001 Epoch: 19 Global Step: 239360 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:39,538-Speed 3302.56 samples/sec Loss 0.7204 LearningRate 0.0001 Epoch: 19 Global Step: 239370 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:42,640-Speed 3302.51 samples/sec Loss 0.7349 LearningRate 0.0001 Epoch: 19 Global Step: 239380 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:45,691-Speed 3357.44 samples/sec Loss 0.7060 LearningRate 0.0001 Epoch: 19 Global Step: 239390 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:48,845-Speed 3247.78 samples/sec Loss 0.7260 LearningRate 0.0001 Epoch: 19 Global Step: 239400 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:51,911-Speed 3341.16 samples/sec Loss 0.7128 LearningRate 0.0001 Epoch: 19 Global Step: 239410 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:30:55,020-Speed 3295.05 samples/sec Loss 0.6878 LearningRate 0.0001 Epoch: 19 Global Step: 239420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:30:58,085-Speed 3341.88 samples/sec Loss 0.7192 LearningRate 0.0001 Epoch: 19 Global Step: 239430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:31:01,208-Speed 3279.61 samples/sec Loss 0.7298 LearningRate 0.0001 Epoch: 19 Global Step: 239440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:31:04,336-Speed 3275.20 samples/sec Loss 0.7070 LearningRate 0.0001 Epoch: 19 Global Step: 239450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:31:07,506-Speed 3231.43 samples/sec Loss 0.7498 LearningRate 0.0001 Epoch: 19 Global Step: 239460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:31:10,617-Speed 3292.22 samples/sec Loss 0.7778 LearningRate 0.0001 Epoch: 19 Global Step: 239470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:31:13,694-Speed 3330.19 samples/sec Loss 0.7095 LearningRate 0.0001 Epoch: 19 Global Step: 239480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:31:16,822-Speed 3275.28 samples/sec Loss 0.7428 LearningRate 0.0001 Epoch: 19 Global Step: 239490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:31:19,935-Speed 3289.83 samples/sec Loss 0.7383 LearningRate 0.0001 Epoch: 19 Global Step: 239500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:31:23,050-Speed 3288.46 samples/sec Loss 0.7216 LearningRate 0.0001 Epoch: 19 Global Step: 239510 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:31:26,201-Speed 3250.72 samples/sec Loss 0.7161 LearningRate 0.0001 Epoch: 19 Global Step: 239520 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:31:29,391-Speed 3211.77 samples/sec Loss 0.7212 LearningRate 0.0001 Epoch: 19 Global Step: 239530 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:31:32,452-Speed 3345.88 samples/sec Loss 0.6894 LearningRate 0.0001 Epoch: 19 Global Step: 239540 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:31:35,566-Speed 3290.02 samples/sec Loss 0.7621 LearningRate 0.0001 Epoch: 19 Global Step: 239550 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:31:38,688-Speed 3280.85 samples/sec Loss 0.7090 LearningRate 0.0001 Epoch: 19 Global Step: 239560 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:31:41,768-Speed 3324.71 samples/sec Loss 0.7268 LearningRate 0.0001 Epoch: 19 Global Step: 239570 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:31:44,872-Speed 3300.27 samples/sec Loss 0.7248 LearningRate 0.0001 Epoch: 19 Global Step: 239580 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:31:47,996-Speed 3279.15 samples/sec Loss 0.6953 LearningRate 0.0001 Epoch: 19 Global Step: 239590 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:31:51,142-Speed 3256.11 samples/sec Loss 0.7592 LearningRate 0.0001 Epoch: 19 Global Step: 239600 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:31:54,227-Speed 3319.93 samples/sec Loss 0.7103 LearningRate 0.0001 Epoch: 19 Global Step: 239610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:31:57,315-Speed 3317.26 samples/sec Loss 0.7581 LearningRate 0.0001 Epoch: 19 Global Step: 239620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:32:00,432-Speed 3286.42 samples/sec Loss 0.7470 LearningRate 0.0001 Epoch: 19 Global Step: 239630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:32:03,584-Speed 3250.04 samples/sec Loss 0.7443 LearningRate 0.0001 Epoch: 19 Global Step: 239640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:32:06,700-Speed 3287.37 samples/sec Loss 0.7257 LearningRate 0.0001 Epoch: 19 Global Step: 239650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:32:09,770-Speed 3336.51 samples/sec Loss 0.7524 LearningRate 0.0001 Epoch: 19 Global Step: 239660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:32:12,912-Speed 3259.88 samples/sec Loss 0.7122 LearningRate 0.0001 Epoch: 19 Global Step: 239670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:32:16,011-Speed 3304.82 samples/sec Loss 0.7096 LearningRate 0.0001 Epoch: 19 Global Step: 239680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:32:19,081-Speed 3337.04 samples/sec Loss 0.7142 LearningRate 0.0001 Epoch: 19 Global Step: 239690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:32:22,148-Speed 3339.90 samples/sec Loss 0.7375 LearningRate 0.0001 Epoch: 19 Global Step: 239700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:32:25,323-Speed 3225.19 samples/sec Loss 0.7238 LearningRate 0.0001 Epoch: 19 Global Step: 239710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:32:28,474-Speed 3251.05 samples/sec Loss 0.7331 LearningRate 0.0001 Epoch: 19 Global Step: 239720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:32:31,551-Speed 3328.54 samples/sec Loss 0.7134 LearningRate 0.0001 Epoch: 19 Global Step: 239730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:32:34,698-Speed 3256.05 samples/sec Loss 0.7415 LearningRate 0.0001 Epoch: 19 Global Step: 239740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:32:37,806-Speed 3295.27 samples/sec Loss 0.7195 LearningRate 0.0001 Epoch: 19 Global Step: 239750 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:32:40,881-Speed 3331.85 samples/sec Loss 0.7232 LearningRate 0.0001 Epoch: 19 Global Step: 239760 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:32:43,969-Speed 3318.17 samples/sec Loss 0.7326 LearningRate 0.0001 Epoch: 19 Global Step: 239770 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:32:47,080-Speed 3293.50 samples/sec Loss 0.7309 LearningRate 0.0001 Epoch: 19 Global Step: 239780 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:32:50,185-Speed 3298.83 samples/sec Loss 0.7209 LearningRate 0.0001 Epoch: 19 Global Step: 239790 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:32:53,287-Speed 3301.37 samples/sec Loss 0.7509 LearningRate 0.0001 Epoch: 19 Global Step: 239800 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:32:56,427-Speed 3262.81 samples/sec Loss 0.7118 LearningRate 0.0001 Epoch: 19 Global Step: 239810 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:32:59,525-Speed 3306.33 samples/sec Loss 0.7362 LearningRate 0.0001 Epoch: 19 Global Step: 239820 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:33:02,628-Speed 3300.44 samples/sec Loss 0.7327 LearningRate 0.0001 Epoch: 19 Global Step: 239830 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:33:05,768-Speed 3262.41 samples/sec Loss 0.7340 LearningRate 0.0001 Epoch: 19 Global Step: 239840 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:33:08,880-Speed 3291.35 samples/sec Loss 0.7375 LearningRate 0.0001 Epoch: 19 Global Step: 239850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:12,042-Speed 3239.44 samples/sec Loss 0.7274 LearningRate 0.0001 Epoch: 19 Global Step: 239860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:15,248-Speed 3194.97 samples/sec Loss 0.7499 LearningRate 0.0001 Epoch: 19 Global Step: 239870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:18,410-Speed 3240.15 samples/sec Loss 0.7073 LearningRate 0.0001 Epoch: 19 Global Step: 239880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:21,515-Speed 3299.02 samples/sec Loss 0.7019 LearningRate 0.0001 Epoch: 19 Global Step: 239890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:24,606-Speed 3313.27 samples/sec Loss 0.7274 LearningRate 0.0001 Epoch: 19 Global Step: 239900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:27,675-Speed 3337.34 samples/sec Loss 0.7068 LearningRate 0.0001 Epoch: 19 Global Step: 239910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:30,852-Speed 3224.67 samples/sec Loss 0.7292 LearningRate 0.0001 Epoch: 19 Global Step: 239920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:33,947-Speed 3309.61 samples/sec Loss 0.7444 LearningRate 0.0001 Epoch: 19 Global Step: 239930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:37,036-Speed 3315.35 samples/sec Loss 0.7189 LearningRate 0.0001 Epoch: 19 Global Step: 239940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:40,148-Speed 3291.75 samples/sec Loss 0.7088 LearningRate 0.0001 Epoch: 19 Global Step: 239950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:33:43,251-Speed 3300.52 samples/sec Loss 0.7600 LearningRate 0.0001 Epoch: 19 Global Step: 239960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:46,353-Speed 3302.87 samples/sec Loss 0.6913 LearningRate 0.0001 Epoch: 19 Global Step: 239970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:49,435-Speed 3323.01 samples/sec Loss 0.7659 LearningRate 0.0001 Epoch: 19 Global Step: 239980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:52,598-Speed 3238.94 samples/sec Loss 0.7104 LearningRate 0.0001 Epoch: 19 Global Step: 239990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:55,756-Speed 3242.96 samples/sec Loss 0.7408 LearningRate 0.0001 Epoch: 19 Global Step: 240000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:33:58,933-Speed 3224.55 samples/sec Loss 0.7509 LearningRate 0.0001 Epoch: 19 Global Step: 240010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:34:02,094-Speed 3240.24 samples/sec Loss 0.7277 LearningRate 0.0001 Epoch: 19 Global Step: 240020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:34:05,302-Speed 3192.96 samples/sec Loss 0.7283 LearningRate 0.0001 Epoch: 19 Global Step: 240030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:34:08,442-Speed 3262.63 samples/sec Loss 0.7201 LearningRate 0.0001 Epoch: 19 Global Step: 240040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:34:11,509-Speed 3339.75 samples/sec Loss 0.7337 LearningRate 0.0001 Epoch: 19 Global Step: 240050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:34:14,652-Speed 3259.65 samples/sec Loss 0.7445 LearningRate 0.0001 Epoch: 19 Global Step: 240060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:34:17,774-Speed 3280.26 samples/sec Loss 0.6928 LearningRate 0.0001 Epoch: 19 Global Step: 240070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:34:20,895-Speed 3281.89 samples/sec Loss 0.7246 LearningRate 0.0001 Epoch: 19 Global Step: 240080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:34:24,035-Speed 3262.76 samples/sec Loss 0.7146 LearningRate 0.0001 Epoch: 19 Global Step: 240090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:34:27,136-Speed 3302.96 samples/sec Loss 0.7291 LearningRate 0.0001 Epoch: 19 Global Step: 240100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:34:30,256-Speed 3283.53 samples/sec Loss 0.7304 LearningRate 0.0001 Epoch: 19 Global Step: 240110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:34:33,327-Speed 3334.65 samples/sec Loss 0.7122 LearningRate 0.0001 Epoch: 19 Global Step: 240120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:34:36,451-Speed 3279.67 samples/sec Loss 0.7321 LearningRate 0.0001 Epoch: 19 Global Step: 240130 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:34:39,602-Speed 3250.38 samples/sec Loss 0.7190 LearningRate 0.0001 Epoch: 19 Global Step: 240140 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:34:42,833-Speed 3170.79 samples/sec Loss 0.7430 LearningRate 0.0001 Epoch: 19 Global Step: 240150 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:34:45,901-Speed 3338.56 samples/sec Loss 0.7228 LearningRate 0.0001 Epoch: 19 Global Step: 240160 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:34:48,986-Speed 3319.74 samples/sec Loss 0.6825 LearningRate 0.0001 Epoch: 19 Global Step: 240170 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:34:52,054-Speed 3339.40 samples/sec Loss 0.6968 LearningRate 0.0001 Epoch: 19 Global Step: 240180 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:34:55,168-Speed 3288.99 samples/sec Loss 0.7302 LearningRate 0.0001 Epoch: 19 Global Step: 240190 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:34:58,233-Speed 3341.95 samples/sec Loss 0.6924 LearningRate 0.0001 Epoch: 19 Global Step: 240200 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:35:01,283-Speed 3358.74 samples/sec Loss 0.7611 LearningRate 0.0001 Epoch: 19 Global Step: 240210 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:35:04,386-Speed 3301.48 samples/sec Loss 0.7336 LearningRate 0.0001 Epoch: 19 Global Step: 240220 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:35:07,485-Speed 3305.33 samples/sec Loss 0.7187 LearningRate 0.0001 Epoch: 19 Global Step: 240230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:10,566-Speed 3324.65 samples/sec Loss 0.7117 LearningRate 0.0001 Epoch: 19 Global Step: 240240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:13,680-Speed 3289.27 samples/sec Loss 0.7019 LearningRate 0.0001 Epoch: 19 Global Step: 240250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:16,803-Speed 3279.74 samples/sec Loss 0.7260 LearningRate 0.0001 Epoch: 19 Global Step: 240260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:19,937-Speed 3268.82 samples/sec Loss 0.7343 LearningRate 0.0001 Epoch: 19 Global Step: 240270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:23,044-Speed 3296.00 samples/sec Loss 0.7594 LearningRate 0.0001 Epoch: 19 Global Step: 240280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:26,166-Speed 3281.55 samples/sec Loss 0.7137 LearningRate 0.0001 Epoch: 19 Global Step: 240290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:29,218-Speed 3355.73 samples/sec Loss 0.7772 LearningRate 0.0001 Epoch: 19 Global Step: 240300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:32,317-Speed 3305.61 samples/sec Loss 0.7527 LearningRate 0.0001 Epoch: 19 Global Step: 240310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:35,425-Speed 3296.24 samples/sec Loss 0.7328 LearningRate 0.0001 Epoch: 19 Global Step: 240320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:38,483-Speed 3349.91 samples/sec Loss 0.7386 LearningRate 0.0001 Epoch: 19 Global Step: 240330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:35:41,650-Speed 3233.63 samples/sec Loss 0.7089 LearningRate 0.0001 Epoch: 19 Global Step: 240340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:35:44,679-Speed 3381.51 samples/sec Loss 0.7134 LearningRate 0.0001 Epoch: 19 Global Step: 240350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:47,852-Speed 3228.42 samples/sec Loss 0.7528 LearningRate 0.0001 Epoch: 19 Global Step: 240360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:50,937-Speed 3321.05 samples/sec Loss 0.7162 LearningRate 0.0001 Epoch: 19 Global Step: 240370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:54,095-Speed 3243.40 samples/sec Loss 0.7020 LearningRate 0.0001 Epoch: 19 Global Step: 240380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:35:57,172-Speed 3329.49 samples/sec Loss 0.7338 LearningRate 0.0001 Epoch: 19 Global Step: 240390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:36:00,339-Speed 3234.42 samples/sec Loss 0.7442 LearningRate 0.0001 Epoch: 19 Global Step: 240400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:36:03,446-Speed 3297.28 samples/sec Loss 0.7070 LearningRate 0.0001 Epoch: 19 Global Step: 240410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:36:06,548-Speed 3302.14 samples/sec Loss 0.7382 LearningRate 0.0001 Epoch: 19 Global Step: 240420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:36:09,623-Speed 3330.40 samples/sec Loss 0.7230 LearningRate 0.0001 Epoch: 19 Global Step: 240430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:36:12,764-Speed 3262.00 samples/sec Loss 0.7142 LearningRate 0.0001 Epoch: 19 Global Step: 240440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:36:16,460-Speed 2771.23 samples/sec Loss 0.7352 LearningRate 0.0001 Epoch: 19 Global Step: 240450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:36:19,546-Speed 3319.07 samples/sec Loss 0.7274 LearningRate 0.0001 Epoch: 19 Global Step: 240460 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:36:22,618-Speed 3334.00 samples/sec Loss 0.7316 LearningRate 0.0001 Epoch: 19 Global Step: 240470 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:36:25,674-Speed 3351.62 samples/sec Loss 0.6914 LearningRate 0.0001 Epoch: 19 Global Step: 240480 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:36:28,838-Speed 3237.71 samples/sec Loss 0.7126 LearningRate 0.0001 Epoch: 19 Global Step: 240490 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:36:31,916-Speed 3328.79 samples/sec Loss 0.7325 LearningRate 0.0001 Epoch: 19 Global Step: 240500 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:36:35,082-Speed 3234.24 samples/sec Loss 0.7242 LearningRate 0.0001 Epoch: 19 Global Step: 240510 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:36:38,201-Speed 3285.42 samples/sec Loss 0.7029 LearningRate 0.0001 Epoch: 19 Global Step: 240520 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:36:41,307-Speed 3297.81 samples/sec Loss 0.7112 LearningRate 0.0001 Epoch: 19 Global Step: 240530 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:36:44,384-Speed 3328.95 samples/sec Loss 0.7279 LearningRate 0.0001 Epoch: 19 Global Step: 240540 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:36:47,550-Speed 3235.40 samples/sec Loss 0.7517 LearningRate 0.0001 Epoch: 19 Global Step: 240550 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:36:50,613-Speed 3343.57 samples/sec Loss 0.7392 LearningRate 0.0001 Epoch: 19 Global Step: 240560 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:36:53,760-Speed 3254.97 samples/sec Loss 0.7830 LearningRate 0.0001 Epoch: 19 Global Step: 240570 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:36:56,810-Speed 3358.37 samples/sec Loss 0.6955 LearningRate 0.0001 Epoch: 19 Global Step: 240580 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:37:00,019-Speed 3191.85 samples/sec Loss 0.7015 LearningRate 0.0001 Epoch: 19 Global Step: 240590 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:37:03,190-Speed 3230.44 samples/sec Loss 0.7186 LearningRate 0.0001 Epoch: 19 Global Step: 240600 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:37:06,373-Speed 3217.69 samples/sec Loss 0.7481 LearningRate 0.0001 Epoch: 19 Global Step: 240610 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:37:09,451-Speed 3327.93 samples/sec Loss 0.6835 LearningRate 0.0001 Epoch: 19 Global Step: 240620 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:37:12,555-Speed 3300.23 samples/sec Loss 0.7032 LearningRate 0.0001 Epoch: 19 Global Step: 240630 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:37:15,673-Speed 3284.64 samples/sec Loss 0.7031 LearningRate 0.0001 Epoch: 19 Global Step: 240640 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:37:18,793-Speed 3284.27 samples/sec Loss 0.7242 LearningRate 0.0001 Epoch: 19 Global Step: 240650 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:37:21,869-Speed 3330.12 samples/sec Loss 0.7219 LearningRate 0.0001 Epoch: 19 Global Step: 240660 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:37:24,996-Speed 3275.41 samples/sec Loss 0.6977 LearningRate 0.0001 Epoch: 19 Global Step: 240670 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:37:28,107-Speed 3292.53 samples/sec Loss 0.7373 LearningRate 0.0001 Epoch: 19 Global Step: 240680 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:37:31,188-Speed 3324.40 samples/sec Loss 0.7414 LearningRate 0.0001 Epoch: 19 Global Step: 240690 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:37:34,286-Speed 3305.87 samples/sec Loss 0.7682 LearningRate 0.0001 Epoch: 19 Global Step: 240700 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:37:37,508-Speed 3179.67 samples/sec Loss 0.7572 LearningRate 0.0001 Epoch: 19 Global Step: 240710 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:37:40,617-Speed 3294.30 samples/sec Loss 0.7306 LearningRate 0.0001 Epoch: 19 Global Step: 240720 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:37:43,784-Speed 3234.38 samples/sec Loss 0.7385 LearningRate 0.0001 Epoch: 19 Global Step: 240730 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:37:47,452-Speed 2792.71 samples/sec Loss 0.7418 LearningRate 0.0001 Epoch: 19 Global Step: 240740 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:37:50,621-Speed 3232.17 samples/sec Loss 0.7372 LearningRate 0.0001 Epoch: 19 Global Step: 240750 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:37:53,735-Speed 3289.97 samples/sec Loss 0.7153 LearningRate 0.0001 Epoch: 19 Global Step: 240760 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:37:58,054-Speed 2371.69 samples/sec Loss 0.6978 LearningRate 0.0001 Epoch: 19 Global Step: 240770 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:38:01,137-Speed 3321.82 samples/sec Loss 0.7233 LearningRate 0.0001 Epoch: 19 Global Step: 240780 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:38:04,771-Speed 2818.81 samples/sec Loss 0.7287 LearningRate 0.0001 Epoch: 19 Global Step: 240790 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:38:07,873-Speed 3302.81 samples/sec Loss 0.7134 LearningRate 0.0001 Epoch: 19 Global Step: 240800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:38:10,954-Speed 3323.71 samples/sec Loss 0.7362 LearningRate 0.0001 Epoch: 19 Global Step: 240810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:38:14,073-Speed 3284.28 samples/sec Loss 0.7212 LearningRate 0.0001 Epoch: 19 Global Step: 240820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:38:17,243-Speed 3231.71 samples/sec Loss 0.7149 LearningRate 0.0001 Epoch: 19 Global Step: 240830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:38:20,367-Speed 3279.01 samples/sec Loss 0.7609 LearningRate 0.0001 Epoch: 19 Global Step: 240840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:38:23,488-Speed 3281.86 samples/sec Loss 0.7282 LearningRate 0.0001 Epoch: 19 Global Step: 240850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:38:26,649-Speed 3240.77 samples/sec Loss 0.7225 LearningRate 0.0001 Epoch: 19 Global Step: 240860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:38:29,705-Speed 3352.33 samples/sec Loss 0.7335 LearningRate 0.0001 Epoch: 19 Global Step: 240870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:38:32,788-Speed 3321.86 samples/sec Loss 0.7502 LearningRate 0.0001 Epoch: 19 Global Step: 240880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:38:35,878-Speed 3315.06 samples/sec Loss 0.7383 LearningRate 0.0001 Epoch: 19 Global Step: 240890 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:38:38,939-Speed 3347.09 samples/sec Loss 0.7299 LearningRate 0.0001 Epoch: 19 Global Step: 240900 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:38:42,000-Speed 3346.14 samples/sec Loss 0.7150 LearningRate 0.0001 Epoch: 19 Global Step: 240910 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:38:45,070-Speed 3336.86 samples/sec Loss 0.7267 LearningRate 0.0001 Epoch: 19 Global Step: 240920 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:38:48,150-Speed 3326.19 samples/sec Loss 0.7197 LearningRate 0.0001 Epoch: 19 Global Step: 240930 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:38:51,199-Speed 3359.17 samples/sec Loss 0.7485 LearningRate 0.0001 Epoch: 19 Global Step: 240940 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:38:54,332-Speed 3269.78 samples/sec Loss 0.7486 LearningRate 0.0001 Epoch: 19 Global Step: 240950 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:38:57,385-Speed 3354.58 samples/sec Loss 0.7156 LearningRate 0.0001 Epoch: 19 Global Step: 240960 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:00,467-Speed 3323.61 samples/sec Loss 0.7044 LearningRate 0.0001 Epoch: 19 Global Step: 240970 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:03,586-Speed 3284.51 samples/sec Loss 0.7326 LearningRate 0.0001 Epoch: 19 Global Step: 240980 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:06,651-Speed 3342.27 samples/sec Loss 0.6988 LearningRate 0.0001 Epoch: 19 Global Step: 240990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:39:09,737-Speed 3319.02 samples/sec Loss 0.7436 LearningRate 0.0001 Epoch: 19 Global Step: 241000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:39:12,816-Speed 3326.88 samples/sec Loss 0.7501 LearningRate 0.0001 Epoch: 19 Global Step: 241010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:39:15,949-Speed 3269.55 samples/sec Loss 0.7161 LearningRate 0.0001 Epoch: 19 Global Step: 241020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:39:19,022-Speed 3332.53 samples/sec Loss 0.7211 LearningRate 0.0001 Epoch: 19 Global Step: 241030 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:22,135-Speed 3290.83 samples/sec Loss 0.7393 LearningRate 0.0001 Epoch: 19 Global Step: 241040 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:25,313-Speed 3223.46 samples/sec Loss 0.7103 LearningRate 0.0001 Epoch: 19 Global Step: 241050 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:28,366-Speed 3354.91 samples/sec Loss 0.7590 LearningRate 0.0001 Epoch: 19 Global Step: 241060 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:31,451-Speed 3320.41 samples/sec Loss 0.7141 LearningRate 0.0001 Epoch: 19 Global Step: 241070 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:34,549-Speed 3305.86 samples/sec Loss 0.7462 LearningRate 0.0001 Epoch: 19 Global Step: 241080 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:37,637-Speed 3317.60 samples/sec Loss 0.7577 LearningRate 0.0001 Epoch: 19 Global Step: 241090 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:40,772-Speed 3267.17 samples/sec Loss 0.7715 LearningRate 0.0001 Epoch: 19 Global Step: 241100 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:43,865-Speed 3311.61 samples/sec Loss 0.7073 LearningRate 0.0001 Epoch: 19 Global Step: 241110 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:46,938-Speed 3333.32 samples/sec Loss 0.7577 LearningRate 0.0001 Epoch: 19 Global Step: 241120 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:39:50,087-Speed 3253.14 samples/sec Loss 0.7352 LearningRate 0.0001 Epoch: 19 Global Step: 241130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:39:53,154-Speed 3339.05 samples/sec Loss 0.7080 LearningRate 0.0001 Epoch: 19 Global Step: 241140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:39:56,291-Speed 3266.08 samples/sec Loss 0.7364 LearningRate 0.0001 Epoch: 19 Global Step: 241150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:39:59,366-Speed 3331.01 samples/sec Loss 0.7600 LearningRate 0.0001 Epoch: 19 Global Step: 241160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:02,486-Speed 3283.18 samples/sec Loss 0.6863 LearningRate 0.0001 Epoch: 19 Global Step: 241170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:05,609-Speed 3279.79 samples/sec Loss 0.7557 LearningRate 0.0001 Epoch: 19 Global Step: 241180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:08,658-Speed 3359.09 samples/sec Loss 0.7458 LearningRate 0.0001 Epoch: 19 Global Step: 241190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:11,733-Speed 3331.81 samples/sec Loss 0.7124 LearningRate 0.0001 Epoch: 19 Global Step: 241200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:14,811-Speed 3327.84 samples/sec Loss 0.7133 LearningRate 0.0001 Epoch: 19 Global Step: 241210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:17,891-Speed 3326.09 samples/sec Loss 0.7639 LearningRate 0.0001 Epoch: 19 Global Step: 241220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:20,960-Speed 3337.38 samples/sec Loss 0.7715 LearningRate 0.0001 Epoch: 19 Global Step: 241230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:40:24,023-Speed 3344.58 samples/sec Loss 0.7057 LearningRate 0.0001 Epoch: 19 Global Step: 241240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:40:27,108-Speed 3319.92 samples/sec Loss 0.7212 LearningRate 0.0001 Epoch: 19 Global Step: 241250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:40:30,225-Speed 3287.01 samples/sec Loss 0.7204 LearningRate 0.0001 Epoch: 19 Global Step: 241260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:40:33,299-Speed 3331.50 samples/sec Loss 0.7099 LearningRate 0.0001 Epoch: 19 Global Step: 241270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:36,379-Speed 3326.08 samples/sec Loss 0.6799 LearningRate 0.0001 Epoch: 19 Global Step: 241280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:39,545-Speed 3235.10 samples/sec Loss 0.7015 LearningRate 0.0001 Epoch: 19 Global Step: 241290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:42,667-Speed 3281.20 samples/sec Loss 0.7257 LearningRate 0.0001 Epoch: 19 Global Step: 241300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:45,794-Speed 3276.12 samples/sec Loss 0.7469 LearningRate 0.0001 Epoch: 19 Global Step: 241310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:48,898-Speed 3299.75 samples/sec Loss 0.6800 LearningRate 0.0001 Epoch: 19 Global Step: 241320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:52,063-Speed 3236.55 samples/sec Loss 0.6962 LearningRate 0.0001 Epoch: 19 Global Step: 241330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:55,225-Speed 3239.14 samples/sec Loss 0.7801 LearningRate 0.0001 Epoch: 19 Global Step: 241340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:40:58,306-Speed 3324.91 samples/sec Loss 0.7155 LearningRate 0.0001 Epoch: 19 Global Step: 241350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:41:01,394-Speed 3317.32 samples/sec Loss 0.7064 LearningRate 0.0001 Epoch: 19 Global Step: 241360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:41:04,509-Speed 3288.60 samples/sec Loss 0.6807 LearningRate 0.0001 Epoch: 19 Global Step: 241370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:41:07,584-Speed 3331.45 samples/sec Loss 0.7461 LearningRate 0.0001 Epoch: 19 Global Step: 241380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 22:41:10,685-Speed 3302.35 samples/sec Loss 0.7229 LearningRate 0.0001 Epoch: 19 Global Step: 241390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:41:13,828-Speed 3259.14 samples/sec Loss 0.7474 LearningRate 0.0001 Epoch: 19 Global Step: 241400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:41:17,005-Speed 3224.79 samples/sec Loss 0.7183 LearningRate 0.0001 Epoch: 19 Global Step: 241410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:41:20,167-Speed 3238.90 samples/sec Loss 0.7180 LearningRate 0.0001 Epoch: 19 Global Step: 241420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:41:23,253-Speed 3319.07 samples/sec Loss 0.7203 LearningRate 0.0001 Epoch: 19 Global Step: 241430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:41:26,374-Speed 3282.31 samples/sec Loss 0.7073 LearningRate 0.0001 Epoch: 19 Global Step: 241440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:41:29,535-Speed 3241.04 samples/sec Loss 0.7343 LearningRate 0.0001 Epoch: 19 Global Step: 241450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:41:32,619-Speed 3320.57 samples/sec Loss 0.7449 LearningRate 0.0001 Epoch: 19 Global Step: 241460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:41:35,710-Speed 3314.23 samples/sec Loss 0.7344 LearningRate 0.0001 Epoch: 19 Global Step: 241470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:41:38,856-Speed 3256.43 samples/sec Loss 0.7010 LearningRate 0.0001 Epoch: 19 Global Step: 241480 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:41:41,971-Speed 3287.98 samples/sec Loss 0.7478 LearningRate 0.0001 Epoch: 19 Global Step: 241490 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:41:45,114-Speed 3259.11 samples/sec Loss 0.7025 LearningRate 0.0001 Epoch: 19 Global Step: 241500 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:41:48,203-Speed 3315.48 samples/sec Loss 0.7192 LearningRate 0.0001 Epoch: 19 Global Step: 241510 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:41:51,345-Speed 3259.76 samples/sec Loss 0.7116 LearningRate 0.0001 Epoch: 19 Global Step: 241520 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:41:54,467-Speed 3281.07 samples/sec Loss 0.7765 LearningRate 0.0001 Epoch: 19 Global Step: 241530 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:41:57,534-Speed 3340.00 samples/sec Loss 0.7195 LearningRate 0.0001 Epoch: 19 Global Step: 241540 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:00,657-Speed 3280.18 samples/sec Loss 0.7308 LearningRate 0.0001 Epoch: 19 Global Step: 241550 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:03,746-Speed 3315.80 samples/sec Loss 0.7271 LearningRate 0.0001 Epoch: 19 Global Step: 241560 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:06,924-Speed 3223.68 samples/sec Loss 0.7062 LearningRate 0.0001 Epoch: 19 Global Step: 241570 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:10,021-Speed 3307.35 samples/sec Loss 0.7289 LearningRate 0.0001 Epoch: 19 Global Step: 241580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:42:13,142-Speed 3282.08 samples/sec Loss 0.6989 LearningRate 0.0001 Epoch: 19 Global Step: 241590 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:16,267-Speed 3278.13 samples/sec Loss 0.6727 LearningRate 0.0001 Epoch: 19 Global Step: 241600 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:19,432-Speed 3236.25 samples/sec Loss 0.7349 LearningRate 0.0001 Epoch: 19 Global Step: 241610 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:22,548-Speed 3286.69 samples/sec Loss 0.7309 LearningRate 0.0001 Epoch: 19 Global Step: 241620 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:25,739-Speed 3210.55 samples/sec Loss 0.7164 LearningRate 0.0001 Epoch: 19 Global Step: 241630 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:28,896-Speed 3244.84 samples/sec Loss 0.7179 LearningRate 0.0001 Epoch: 19 Global Step: 241640 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:32,053-Speed 3244.00 samples/sec Loss 0.7385 LearningRate 0.0001 Epoch: 19 Global Step: 241650 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:35,119-Speed 3341.13 samples/sec Loss 0.7407 LearningRate 0.0001 Epoch: 19 Global Step: 241660 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:38,274-Speed 3246.76 samples/sec Loss 0.7451 LearningRate 0.0001 Epoch: 19 Global Step: 241670 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:41,403-Speed 3273.47 samples/sec Loss 0.7426 LearningRate 0.0001 Epoch: 19 Global Step: 241680 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:42:44,492-Speed 3315.96 samples/sec Loss 0.7348 LearningRate 0.0001 Epoch: 19 Global Step: 241690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:42:47,668-Speed 3225.02 samples/sec Loss 0.6959 LearningRate 0.0001 Epoch: 19 Global Step: 241700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:42:50,813-Speed 3257.31 samples/sec Loss 0.7292 LearningRate 0.0001 Epoch: 19 Global Step: 241710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:42:53,878-Speed 3342.17 samples/sec Loss 0.7075 LearningRate 0.0001 Epoch: 19 Global Step: 241720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:42:56,947-Speed 3337.10 samples/sec Loss 0.7356 LearningRate 0.0001 Epoch: 19 Global Step: 241730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:43:00,022-Speed 3331.57 samples/sec Loss 0.7303 LearningRate 0.0001 Epoch: 19 Global Step: 241740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:43:03,110-Speed 3316.72 samples/sec Loss 0.6988 LearningRate 0.0001 Epoch: 19 Global Step: 241750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:43:06,237-Speed 3276.30 samples/sec Loss 0.7436 LearningRate 0.0001 Epoch: 19 Global Step: 241760 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:43:09,330-Speed 3311.12 samples/sec Loss 0.7143 LearningRate 0.0001 Epoch: 19 Global Step: 241770 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:43:12,439-Speed 3294.83 samples/sec Loss 0.7015 LearningRate 0.0001 Epoch: 19 Global Step: 241780 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:43:15,571-Speed 3270.99 samples/sec Loss 0.7431 LearningRate 0.0001 Epoch: 19 Global Step: 241790 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:43:18,704-Speed 3269.35 samples/sec Loss 0.7321 LearningRate 0.0001 Epoch: 19 Global Step: 241800 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:43:21,750-Speed 3361.81 samples/sec Loss 0.7720 LearningRate 0.0001 Epoch: 19 Global Step: 241810 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:43:25,461-Speed 2760.47 samples/sec Loss 0.7270 LearningRate 0.0001 Epoch: 19 Global Step: 241820 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:43:28,576-Speed 3287.98 samples/sec Loss 0.7235 LearningRate 0.0001 Epoch: 19 Global Step: 241830 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:43:31,736-Speed 3241.86 samples/sec Loss 0.7306 LearningRate 0.0001 Epoch: 19 Global Step: 241840 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:43:34,799-Speed 3344.35 samples/sec Loss 0.6995 LearningRate 0.0001 Epoch: 19 Global Step: 241850 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:43:37,897-Speed 3305.80 samples/sec Loss 0.7326 LearningRate 0.0001 Epoch: 19 Global Step: 241860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:43:41,066-Speed 3232.86 samples/sec Loss 0.7552 LearningRate 0.0001 Epoch: 19 Global Step: 241870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:43:44,130-Speed 3342.87 samples/sec Loss 0.7084 LearningRate 0.0001 Epoch: 19 Global Step: 241880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:43:47,291-Speed 3240.98 samples/sec Loss 0.6943 LearningRate 0.0001 Epoch: 19 Global Step: 241890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:43:50,397-Speed 3297.83 samples/sec Loss 0.7200 LearningRate 0.0001 Epoch: 19 Global Step: 241900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:43:53,500-Speed 3301.45 samples/sec Loss 0.7293 LearningRate 0.0001 Epoch: 19 Global Step: 241910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:43:56,630-Speed 3272.08 samples/sec Loss 0.7582 LearningRate 0.0001 Epoch: 19 Global Step: 241920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:43:59,834-Speed 3196.20 samples/sec Loss 0.7641 LearningRate 0.0001 Epoch: 19 Global Step: 241930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:44:02,934-Speed 3304.22 samples/sec Loss 0.7331 LearningRate 0.0001 Epoch: 19 Global Step: 241940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:44:06,009-Speed 3331.30 samples/sec Loss 0.7311 LearningRate 0.0001 Epoch: 19 Global Step: 241950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:44:09,080-Speed 3336.09 samples/sec Loss 0.7203 LearningRate 0.0001 Epoch: 19 Global Step: 241960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:44:12,166-Speed 3318.73 samples/sec Loss 0.7300 LearningRate 0.0001 Epoch: 19 Global Step: 241970 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:44:15,286-Speed 3283.54 samples/sec Loss 0.7450 LearningRate 0.0001 Epoch: 19 Global Step: 241980 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:44:18,362-Speed 3330.32 samples/sec Loss 0.7309 LearningRate 0.0001 Epoch: 19 Global Step: 241990 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:44:21,412-Speed 3357.42 samples/sec Loss 0.7325 LearningRate 0.0001 Epoch: 19 Global Step: 242000 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:44:24,555-Speed 3259.10 samples/sec Loss 0.7809 LearningRate 0.0001 Epoch: 19 Global Step: 242010 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:44:27,696-Speed 3261.02 samples/sec Loss 0.7016 LearningRate 0.0001 Epoch: 19 Global Step: 242020 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:44:30,791-Speed 3309.92 samples/sec Loss 0.7229 LearningRate 0.0001 Epoch: 19 Global Step: 242030 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:44:33,868-Speed 3329.39 samples/sec Loss 0.7245 LearningRate 0.0001 Epoch: 19 Global Step: 242040 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:44:36,938-Speed 3336.95 samples/sec Loss 0.7523 LearningRate 0.0001 Epoch: 19 Global Step: 242050 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:44:40,047-Speed 3294.80 samples/sec Loss 0.7313 LearningRate 0.0001 Epoch: 19 Global Step: 242060 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:44:43,229-Speed 3218.80 samples/sec Loss 0.7352 LearningRate 0.0001 Epoch: 19 Global Step: 242070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:44:46,304-Speed 3331.58 samples/sec Loss 0.7336 LearningRate 0.0001 Epoch: 19 Global Step: 242080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:44:49,441-Speed 3265.29 samples/sec Loss 0.7825 LearningRate 0.0001 Epoch: 19 Global Step: 242090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:44:52,590-Speed 3253.18 samples/sec Loss 0.7480 LearningRate 0.0001 Epoch: 19 Global Step: 242100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:44:55,663-Speed 3333.86 samples/sec Loss 0.7159 LearningRate 0.0001 Epoch: 19 Global Step: 242110 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:44:58,761-Speed 3306.10 samples/sec Loss 0.7236 LearningRate 0.0001 Epoch: 19 Global Step: 242120 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:45:01,896-Speed 3267.06 samples/sec Loss 0.7353 LearningRate 0.0001 Epoch: 19 Global Step: 242130 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:45:05,007-Speed 3292.98 samples/sec Loss 0.7110 LearningRate 0.0001 Epoch: 19 Global Step: 242140 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:45:08,089-Speed 3323.67 samples/sec Loss 0.7276 LearningRate 0.0001 Epoch: 19 Global Step: 242150 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:45:11,176-Speed 3317.82 samples/sec Loss 0.7267 LearningRate 0.0001 Epoch: 19 Global Step: 242160 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:45:14,274-Speed 3306.89 samples/sec Loss 0.7293 LearningRate 0.0001 Epoch: 19 Global Step: 242170 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:45:17,365-Speed 3313.33 samples/sec Loss 0.7483 LearningRate 0.0001 Epoch: 19 Global Step: 242180 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:45:20,463-Speed 3306.33 samples/sec Loss 0.7476 LearningRate 0.0001 Epoch: 19 Global Step: 242190 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:45:23,650-Speed 3214.49 samples/sec Loss 0.7188 LearningRate 0.0001 Epoch: 19 Global Step: 242200 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:45:26,775-Speed 3278.61 samples/sec Loss 0.7122 LearningRate 0.0001 Epoch: 19 Global Step: 242210 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:45:29,912-Speed 3264.16 samples/sec Loss 0.7240 LearningRate 0.0001 Epoch: 19 Global Step: 242220 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:45:33,008-Speed 3309.38 samples/sec Loss 0.7035 LearningRate 0.0001 Epoch: 19 Global Step: 242230 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:45:36,098-Speed 3314.36 samples/sec Loss 0.7049 LearningRate 0.0001 Epoch: 19 Global Step: 242240 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:45:39,233-Speed 3267.10 samples/sec Loss 0.7058 LearningRate 0.0001 Epoch: 19 Global Step: 242250 Fp16 Grad Scale: 4096 Required: 1 hours Training: 2022-04-27 22:45:42,353-Speed 3283.62 samples/sec Loss 0.7028 LearningRate 0.0001 Epoch: 19 Global Step: 242260 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:45:45,449-Speed 3309.11 samples/sec Loss 0.6885 LearningRate 0.0001 Epoch: 19 Global Step: 242270 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:45:48,616-Speed 3234.12 samples/sec Loss 0.7544 LearningRate 0.0001 Epoch: 19 Global Step: 242280 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:45:51,732-Speed 3286.54 samples/sec Loss 0.7168 LearningRate 0.0001 Epoch: 19 Global Step: 242290 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:45:54,823-Speed 3314.76 samples/sec Loss 0.7159 LearningRate 0.0001 Epoch: 19 Global Step: 242300 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:45:57,915-Speed 3312.59 samples/sec Loss 0.7308 LearningRate 0.0001 Epoch: 19 Global Step: 242310 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:01,043-Speed 3274.48 samples/sec Loss 0.7046 LearningRate 0.0001 Epoch: 19 Global Step: 242320 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:04,131-Speed 3316.87 samples/sec Loss 0.7142 LearningRate 0.0001 Epoch: 19 Global Step: 242330 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:07,249-Speed 3285.92 samples/sec Loss 0.7626 LearningRate 0.0001 Epoch: 19 Global Step: 242340 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:10,315-Speed 3340.17 samples/sec Loss 0.7854 LearningRate 0.0001 Epoch: 19 Global Step: 242350 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:13,471-Speed 3246.17 samples/sec Loss 0.7218 LearningRate 0.0001 Epoch: 19 Global Step: 242360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:46:16,563-Speed 3312.07 samples/sec Loss 0.7187 LearningRate 0.0001 Epoch: 19 Global Step: 242370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:46:19,672-Speed 3294.90 samples/sec Loss 0.6855 LearningRate 0.0001 Epoch: 19 Global Step: 242380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:46:22,743-Speed 3336.03 samples/sec Loss 0.7317 LearningRate 0.0001 Epoch: 19 Global Step: 242390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:46:25,826-Speed 3322.22 samples/sec Loss 0.7056 LearningRate 0.0001 Epoch: 19 Global Step: 242400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:46:28,928-Speed 3301.90 samples/sec Loss 0.7372 LearningRate 0.0001 Epoch: 19 Global Step: 242410 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:32,047-Speed 3284.53 samples/sec Loss 0.6978 LearningRate 0.0001 Epoch: 19 Global Step: 242420 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:35,150-Speed 3301.03 samples/sec Loss 0.7633 LearningRate 0.0001 Epoch: 19 Global Step: 242430 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:38,310-Speed 3241.90 samples/sec Loss 0.6804 LearningRate 0.0001 Epoch: 19 Global Step: 242440 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:41,408-Speed 3308.02 samples/sec Loss 0.7274 LearningRate 0.0001 Epoch: 19 Global Step: 242450 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:44,493-Speed 3320.76 samples/sec Loss 0.7016 LearningRate 0.0001 Epoch: 19 Global Step: 242460 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:47,669-Speed 3225.20 samples/sec Loss 0.7318 LearningRate 0.0001 Epoch: 19 Global Step: 242470 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:50,839-Speed 3231.22 samples/sec Loss 0.7540 LearningRate 0.0001 Epoch: 19 Global Step: 242480 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:53,925-Speed 3318.78 samples/sec Loss 0.7183 LearningRate 0.0001 Epoch: 19 Global Step: 242490 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:46:57,007-Speed 3323.98 samples/sec Loss 0.6977 LearningRate 0.0001 Epoch: 19 Global Step: 242500 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:47:00,128-Speed 3281.55 samples/sec Loss 0.7215 LearningRate 0.0001 Epoch: 19 Global Step: 242510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:47:03,281-Speed 3248.38 samples/sec Loss 0.7175 LearningRate 0.0001 Epoch: 19 Global Step: 242520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:47:06,387-Speed 3297.95 samples/sec Loss 0.7010 LearningRate 0.0001 Epoch: 19 Global Step: 242530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:47:09,470-Speed 3323.29 samples/sec Loss 0.7322 LearningRate 0.0001 Epoch: 19 Global Step: 242540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 22:47:12,643-Speed 3227.74 samples/sec Loss 0.7321 LearningRate 0.0001 Epoch: 19 Global Step: 242550 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:47:15,770-Speed 3276.42 samples/sec Loss 0.6899 LearningRate 0.0001 Epoch: 19 Global Step: 242560 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:47:18,906-Speed 3266.24 samples/sec Loss 0.7029 LearningRate 0.0001 Epoch: 19 Global Step: 242570 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:47:21,988-Speed 3322.52 samples/sec Loss 0.6983 LearningRate 0.0001 Epoch: 19 Global Step: 242580 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:47:25,155-Speed 3234.98 samples/sec Loss 0.7231 LearningRate 0.0001 Epoch: 19 Global Step: 242590 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:47:28,232-Speed 3328.73 samples/sec Loss 0.7054 LearningRate 0.0001 Epoch: 19 Global Step: 242600 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:47:31,426-Speed 3207.71 samples/sec Loss 0.7212 LearningRate 0.0001 Epoch: 19 Global Step: 242610 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:47:34,495-Speed 3336.62 samples/sec Loss 0.7119 LearningRate 0.0001 Epoch: 19 Global Step: 242620 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:47:37,577-Speed 3323.45 samples/sec Loss 0.6950 LearningRate 0.0001 Epoch: 19 Global Step: 242630 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:47:40,651-Speed 3332.92 samples/sec Loss 0.7396 LearningRate 0.0001 Epoch: 19 Global Step: 242640 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 22:47:43,777-Speed 3276.23 samples/sec Loss 0.7146 LearningRate 0.0001 Epoch: 19 Global Step: 242650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:47:46,870-Speed 3311.94 samples/sec Loss 0.7487 LearningRate 0.0001 Epoch: 19 Global Step: 242660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:47:50,006-Speed 3267.08 samples/sec Loss 0.7250 LearningRate 0.0001 Epoch: 19 Global Step: 242670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:47:53,173-Speed 3233.76 samples/sec Loss 0.7285 LearningRate 0.0001 Epoch: 19 Global Step: 242680 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:47:56,260-Speed 3318.19 samples/sec Loss 0.7248 LearningRate 0.0001 Epoch: 19 Global Step: 242690 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:47:59,353-Speed 3311.94 samples/sec Loss 0.7019 LearningRate 0.0001 Epoch: 19 Global Step: 242700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:02,548-Speed 3206.52 samples/sec Loss 0.7063 LearningRate 0.0001 Epoch: 19 Global Step: 242710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:05,657-Speed 3294.04 samples/sec Loss 0.7390 LearningRate 0.0001 Epoch: 19 Global Step: 242720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:08,755-Speed 3306.96 samples/sec Loss 0.7392 LearningRate 0.0001 Epoch: 19 Global Step: 242730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:11,859-Speed 3299.67 samples/sec Loss 0.7157 LearningRate 0.0001 Epoch: 19 Global Step: 242740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:15,006-Speed 3254.59 samples/sec Loss 0.7131 LearningRate 0.0001 Epoch: 19 Global Step: 242750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:48:18,184-Speed 3223.79 samples/sec Loss 0.7131 LearningRate 0.0001 Epoch: 19 Global Step: 242760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:48:21,267-Speed 3322.02 samples/sec Loss 0.6836 LearningRate 0.0001 Epoch: 19 Global Step: 242770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:48:24,468-Speed 3200.08 samples/sec Loss 0.7375 LearningRate 0.0001 Epoch: 19 Global Step: 242780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:27,599-Speed 3272.10 samples/sec Loss 0.7213 LearningRate 0.0001 Epoch: 19 Global Step: 242790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:30,786-Speed 3214.25 samples/sec Loss 0.7335 LearningRate 0.0001 Epoch: 19 Global Step: 242800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:33,860-Speed 3331.86 samples/sec Loss 0.7387 LearningRate 0.0001 Epoch: 19 Global Step: 242810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:36,996-Speed 3266.89 samples/sec Loss 0.7081 LearningRate 0.0001 Epoch: 19 Global Step: 242820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:40,097-Speed 3303.21 samples/sec Loss 0.7422 LearningRate 0.0001 Epoch: 19 Global Step: 242830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:43,165-Speed 3338.20 samples/sec Loss 0.7092 LearningRate 0.0001 Epoch: 19 Global Step: 242840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:46,230-Speed 3342.55 samples/sec Loss 0.7183 LearningRate 0.0001 Epoch: 19 Global Step: 242850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:49,455-Speed 3176.19 samples/sec Loss 0.6728 LearningRate 0.0001 Epoch: 19 Global Step: 242860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:52,515-Speed 3347.17 samples/sec Loss 0.7159 LearningRate 0.0000 Epoch: 19 Global Step: 242870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:55,604-Speed 3316.49 samples/sec Loss 0.7282 LearningRate 0.0000 Epoch: 19 Global Step: 242880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:48:58,676-Speed 3334.06 samples/sec Loss 0.6858 LearningRate 0.0000 Epoch: 19 Global Step: 242890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:01,796-Speed 3283.10 samples/sec Loss 0.7367 LearningRate 0.0000 Epoch: 19 Global Step: 242900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:04,880-Speed 3321.96 samples/sec Loss 0.7169 LearningRate 0.0000 Epoch: 19 Global Step: 242910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:08,054-Speed 3226.99 samples/sec Loss 0.7170 LearningRate 0.0000 Epoch: 19 Global Step: 242920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:11,152-Speed 3306.27 samples/sec Loss 0.7278 LearningRate 0.0000 Epoch: 19 Global Step: 242930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:14,331-Speed 3222.18 samples/sec Loss 0.7465 LearningRate 0.0000 Epoch: 19 Global Step: 242940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:17,566-Speed 3166.74 samples/sec Loss 0.6885 LearningRate 0.0000 Epoch: 19 Global Step: 242950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:20,710-Speed 3258.10 samples/sec Loss 0.6814 LearningRate 0.0000 Epoch: 19 Global Step: 242960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:23,792-Speed 3322.66 samples/sec Loss 0.7548 LearningRate 0.0000 Epoch: 19 Global Step: 242970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:26,888-Speed 3309.25 samples/sec Loss 0.7531 LearningRate 0.0000 Epoch: 19 Global Step: 242980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:49:29,986-Speed 3305.96 samples/sec Loss 0.7177 LearningRate 0.0000 Epoch: 19 Global Step: 242990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:33,040-Speed 3354.57 samples/sec Loss 0.7683 LearningRate 0.0000 Epoch: 19 Global Step: 243000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:36,097-Speed 3350.56 samples/sec Loss 0.6877 LearningRate 0.0000 Epoch: 19 Global Step: 243010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:39,187-Speed 3314.68 samples/sec Loss 0.7210 LearningRate 0.0000 Epoch: 19 Global Step: 243020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:42,271-Speed 3321.62 samples/sec Loss 0.7137 LearningRate 0.0000 Epoch: 19 Global Step: 243030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:45,377-Speed 3297.93 samples/sec Loss 0.7148 LearningRate 0.0000 Epoch: 19 Global Step: 243040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:48,479-Speed 3301.79 samples/sec Loss 0.7031 LearningRate 0.0000 Epoch: 19 Global Step: 243050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:51,557-Speed 3328.23 samples/sec Loss 0.7267 LearningRate 0.0000 Epoch: 19 Global Step: 243060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:54,734-Speed 3224.23 samples/sec Loss 0.7279 LearningRate 0.0000 Epoch: 19 Global Step: 243070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:49:57,808-Speed 3331.37 samples/sec Loss 0.7468 LearningRate 0.0000 Epoch: 19 Global Step: 243080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:00,926-Speed 3285.49 samples/sec Loss 0.7264 LearningRate 0.0000 Epoch: 19 Global Step: 243090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:50:04,077-Speed 3251.35 samples/sec Loss 0.6963 LearningRate 0.0000 Epoch: 19 Global Step: 243100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:07,151-Speed 3332.02 samples/sec Loss 0.7224 LearningRate 0.0000 Epoch: 19 Global Step: 243110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:10,217-Speed 3341.52 samples/sec Loss 0.7279 LearningRate 0.0000 Epoch: 19 Global Step: 243120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:13,311-Speed 3310.24 samples/sec Loss 0.7329 LearningRate 0.0000 Epoch: 19 Global Step: 243130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:16,363-Speed 3356.02 samples/sec Loss 0.7375 LearningRate 0.0000 Epoch: 19 Global Step: 243140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:19,467-Speed 3299.67 samples/sec Loss 0.7266 LearningRate 0.0000 Epoch: 19 Global Step: 243150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:22,530-Speed 3344.46 samples/sec Loss 0.7266 LearningRate 0.0000 Epoch: 19 Global Step: 243160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:25,599-Speed 3338.25 samples/sec Loss 0.7363 LearningRate 0.0000 Epoch: 19 Global Step: 243170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:28,693-Speed 3310.55 samples/sec Loss 0.6879 LearningRate 0.0000 Epoch: 19 Global Step: 243180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:31,751-Speed 3349.74 samples/sec Loss 0.7243 LearningRate 0.0000 Epoch: 19 Global Step: 243190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:34,840-Speed 3316.09 samples/sec Loss 0.7215 LearningRate 0.0000 Epoch: 19 Global Step: 243200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:50:37,896-Speed 3351.49 samples/sec Loss 0.7379 LearningRate 0.0000 Epoch: 19 Global Step: 243210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:50:41,041-Speed 3257.17 samples/sec Loss 0.7387 LearningRate 0.0000 Epoch: 19 Global Step: 243220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:44,130-Speed 3315.12 samples/sec Loss 0.7355 LearningRate 0.0000 Epoch: 19 Global Step: 243230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:47,220-Speed 3316.28 samples/sec Loss 0.6997 LearningRate 0.0000 Epoch: 19 Global Step: 243240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:50,315-Speed 3308.75 samples/sec Loss 0.7427 LearningRate 0.0000 Epoch: 19 Global Step: 243250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:53,482-Speed 3234.18 samples/sec Loss 0.7369 LearningRate 0.0000 Epoch: 19 Global Step: 243260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:56,571-Speed 3316.42 samples/sec Loss 0.7034 LearningRate 0.0000 Epoch: 19 Global Step: 243270 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:50:59,685-Speed 3289.36 samples/sec Loss 0.7593 LearningRate 0.0000 Epoch: 19 Global Step: 243280 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:02,854-Speed 3231.86 samples/sec Loss 0.7120 LearningRate 0.0000 Epoch: 19 Global Step: 243290 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:05,923-Speed 3338.13 samples/sec Loss 0.7186 LearningRate 0.0000 Epoch: 19 Global Step: 243300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:09,029-Speed 3297.93 samples/sec Loss 0.7381 LearningRate 0.0000 Epoch: 19 Global Step: 243310 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:12,111-Speed 3322.91 samples/sec Loss 0.7675 LearningRate 0.0000 Epoch: 19 Global Step: 243320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:51:15,247-Speed 3267.03 samples/sec Loss 0.7361 LearningRate 0.0000 Epoch: 19 Global Step: 243330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:51:18,428-Speed 3219.62 samples/sec Loss 0.7173 LearningRate 0.0000 Epoch: 19 Global Step: 243340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:51:21,499-Speed 3335.95 samples/sec Loss 0.7264 LearningRate 0.0000 Epoch: 19 Global Step: 243350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:51:24,579-Speed 3325.84 samples/sec Loss 0.7083 LearningRate 0.0000 Epoch: 19 Global Step: 243360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:27,777-Speed 3202.62 samples/sec Loss 0.7388 LearningRate 0.0000 Epoch: 19 Global Step: 243370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:30,890-Speed 3290.67 samples/sec Loss 0.7244 LearningRate 0.0000 Epoch: 19 Global Step: 243380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:33,945-Speed 3353.18 samples/sec Loss 0.7170 LearningRate 0.0000 Epoch: 19 Global Step: 243390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:37,044-Speed 3305.00 samples/sec Loss 0.7262 LearningRate 0.0000 Epoch: 19 Global Step: 243400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:40,101-Speed 3350.41 samples/sec Loss 0.7184 LearningRate 0.0000 Epoch: 19 Global Step: 243410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:43,259-Speed 3244.12 samples/sec Loss 0.7263 LearningRate 0.0000 Epoch: 19 Global Step: 243420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:46,366-Speed 3296.26 samples/sec Loss 0.7243 LearningRate 0.0000 Epoch: 19 Global Step: 243430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:49,467-Speed 3303.97 samples/sec Loss 0.7497 LearningRate 0.0000 Epoch: 19 Global Step: 243440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:52,623-Speed 3245.64 samples/sec Loss 0.7168 LearningRate 0.0000 Epoch: 19 Global Step: 243450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:51:55,736-Speed 3290.40 samples/sec Loss 0.7272 LearningRate 0.0000 Epoch: 19 Global Step: 243460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:51:58,773-Speed 3372.83 samples/sec Loss 0.7377 LearningRate 0.0000 Epoch: 19 Global Step: 243470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:01,950-Speed 3223.99 samples/sec Loss 0.7516 LearningRate 0.0000 Epoch: 19 Global Step: 243480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:05,094-Speed 3258.65 samples/sec Loss 0.7024 LearningRate 0.0000 Epoch: 19 Global Step: 243490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:08,213-Speed 3283.43 samples/sec Loss 0.7121 LearningRate 0.0000 Epoch: 19 Global Step: 243500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:11,310-Speed 3307.62 samples/sec Loss 0.6973 LearningRate 0.0000 Epoch: 19 Global Step: 243510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:14,390-Speed 3326.01 samples/sec Loss 0.7584 LearningRate 0.0000 Epoch: 19 Global Step: 243520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:17,494-Speed 3300.15 samples/sec Loss 0.7277 LearningRate 0.0000 Epoch: 19 Global Step: 243530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:20,584-Speed 3314.79 samples/sec Loss 0.7249 LearningRate 0.0000 Epoch: 19 Global Step: 243540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:23,698-Speed 3289.37 samples/sec Loss 0.6783 LearningRate 0.0000 Epoch: 19 Global Step: 243550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:26,854-Speed 3246.18 samples/sec Loss 0.7123 LearningRate 0.0000 Epoch: 19 Global Step: 243560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:29,996-Speed 3259.52 samples/sec Loss 0.7252 LearningRate 0.0000 Epoch: 19 Global Step: 243570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:52:33,111-Speed 3288.79 samples/sec Loss 0.7259 LearningRate 0.0000 Epoch: 19 Global Step: 243580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:52:36,208-Speed 3307.05 samples/sec Loss 0.7016 LearningRate 0.0000 Epoch: 19 Global Step: 243590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:39,382-Speed 3227.34 samples/sec Loss 0.7380 LearningRate 0.0000 Epoch: 19 Global Step: 243600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:42,546-Speed 3236.98 samples/sec Loss 0.7110 LearningRate 0.0000 Epoch: 19 Global Step: 243610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:45,652-Speed 3297.80 samples/sec Loss 0.7004 LearningRate 0.0000 Epoch: 19 Global Step: 243620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:48,751-Speed 3305.77 samples/sec Loss 0.7275 LearningRate 0.0000 Epoch: 19 Global Step: 243630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:51,867-Speed 3287.20 samples/sec Loss 0.7268 LearningRate 0.0000 Epoch: 19 Global Step: 243640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:54,954-Speed 3318.54 samples/sec Loss 0.7176 LearningRate 0.0000 Epoch: 19 Global Step: 243650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:52:58,025-Speed 3335.82 samples/sec Loss 0.7154 LearningRate 0.0000 Epoch: 19 Global Step: 243660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:01,122-Speed 3307.42 samples/sec Loss 0.7523 LearningRate 0.0000 Epoch: 19 Global Step: 243670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:04,264-Speed 3259.00 samples/sec Loss 0.7278 LearningRate 0.0000 Epoch: 19 Global Step: 243680 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:07,327-Speed 3344.93 samples/sec Loss 0.7113 LearningRate 0.0000 Epoch: 19 Global Step: 243690 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:10,436-Speed 3294.22 samples/sec Loss 0.7580 LearningRate 0.0000 Epoch: 19 Global Step: 243700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:13,642-Speed 3195.32 samples/sec Loss 0.7388 LearningRate 0.0000 Epoch: 19 Global Step: 243710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:16,805-Speed 3238.35 samples/sec Loss 0.7373 LearningRate 0.0000 Epoch: 19 Global Step: 243720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:19,877-Speed 3334.41 samples/sec Loss 0.7274 LearningRate 0.0000 Epoch: 19 Global Step: 243730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:22,983-Speed 3298.11 samples/sec Loss 0.7494 LearningRate 0.0000 Epoch: 19 Global Step: 243740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:26,187-Speed 3196.41 samples/sec Loss 0.7301 LearningRate 0.0000 Epoch: 19 Global Step: 243750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:29,306-Speed 3284.35 samples/sec Loss 0.7434 LearningRate 0.0000 Epoch: 19 Global Step: 243760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:32,414-Speed 3295.64 samples/sec Loss 0.7304 LearningRate 0.0000 Epoch: 19 Global Step: 243770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:35,565-Speed 3251.36 samples/sec Loss 0.7134 LearningRate 0.0000 Epoch: 19 Global Step: 243780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:38,685-Speed 3282.66 samples/sec Loss 0.7390 LearningRate 0.0000 Epoch: 19 Global Step: 243790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:53:41,897-Speed 3189.03 samples/sec Loss 0.7205 LearningRate 0.0000 Epoch: 19 Global Step: 243800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:53:44,959-Speed 3344.58 samples/sec Loss 0.6982 LearningRate 0.0000 Epoch: 19 Global Step: 243810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:48,083-Speed 3279.46 samples/sec Loss 0.7046 LearningRate 0.0000 Epoch: 19 Global Step: 243820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:51,302-Speed 3182.00 samples/sec Loss 0.6955 LearningRate 0.0000 Epoch: 19 Global Step: 243830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:54,399-Speed 3307.79 samples/sec Loss 0.6879 LearningRate 0.0000 Epoch: 19 Global Step: 243840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:53:57,496-Speed 3307.51 samples/sec Loss 0.7145 LearningRate 0.0000 Epoch: 19 Global Step: 243850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:00,550-Speed 3353.73 samples/sec Loss 0.7381 LearningRate 0.0000 Epoch: 19 Global Step: 243860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:03,607-Speed 3350.96 samples/sec Loss 0.7070 LearningRate 0.0000 Epoch: 19 Global Step: 243870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:06,729-Speed 3281.43 samples/sec Loss 0.7284 LearningRate 0.0000 Epoch: 19 Global Step: 243880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:09,797-Speed 3338.22 samples/sec Loss 0.7237 LearningRate 0.0000 Epoch: 19 Global Step: 243890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:12,956-Speed 3242.21 samples/sec Loss 0.7195 LearningRate 0.0000 Epoch: 19 Global Step: 243900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:16,104-Speed 3254.33 samples/sec Loss 0.7205 LearningRate 0.0000 Epoch: 19 Global Step: 243910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:54:19,152-Speed 3359.78 samples/sec Loss 0.7261 LearningRate 0.0000 Epoch: 19 Global Step: 243920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:22,274-Speed 3281.79 samples/sec Loss 0.6814 LearningRate 0.0000 Epoch: 19 Global Step: 243930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:25,422-Speed 3253.45 samples/sec Loss 0.7098 LearningRate 0.0000 Epoch: 19 Global Step: 243940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:28,571-Speed 3253.24 samples/sec Loss 0.7594 LearningRate 0.0000 Epoch: 19 Global Step: 243950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:31,626-Speed 3352.06 samples/sec Loss 0.7074 LearningRate 0.0000 Epoch: 19 Global Step: 243960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:34,719-Speed 3311.73 samples/sec Loss 0.7180 LearningRate 0.0000 Epoch: 19 Global Step: 243970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:54:37,771-Speed 3356.91 samples/sec Loss 0.7332 LearningRate 0.0000 Epoch: 19 Global Step: 243980 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:54:40,827-Speed 3351.59 samples/sec Loss 0.7168 LearningRate 0.0000 Epoch: 19 Global Step: 243990 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:54:43,966-Speed 3263.21 samples/sec Loss 0.7154 LearningRate 0.0000 Epoch: 19 Global Step: 244000 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:54:47,062-Speed 3309.01 samples/sec Loss 0.7160 LearningRate 0.0000 Epoch: 19 Global Step: 244010 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:54:50,116-Speed 3353.68 samples/sec Loss 0.7156 LearningRate 0.0000 Epoch: 19 Global Step: 244020 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:54:53,231-Speed 3288.63 samples/sec Loss 0.7421 LearningRate 0.0000 Epoch: 19 Global Step: 244030 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:54:56,333-Speed 3302.56 samples/sec Loss 0.7305 LearningRate 0.0000 Epoch: 19 Global Step: 244040 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:54:59,472-Speed 3262.76 samples/sec Loss 0.7366 LearningRate 0.0000 Epoch: 19 Global Step: 244050 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:55:02,587-Speed 3288.30 samples/sec Loss 0.7054 LearningRate 0.0000 Epoch: 19 Global Step: 244060 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:55:05,720-Speed 3269.26 samples/sec Loss 0.7353 LearningRate 0.0000 Epoch: 19 Global Step: 244070 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:55:08,780-Speed 3347.71 samples/sec Loss 0.7023 LearningRate 0.0000 Epoch: 19 Global Step: 244080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:11,929-Speed 3253.20 samples/sec Loss 0.7337 LearningRate 0.0000 Epoch: 19 Global Step: 244090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:15,041-Speed 3290.84 samples/sec Loss 0.7239 LearningRate 0.0000 Epoch: 19 Global Step: 244100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:18,217-Speed 3225.97 samples/sec Loss 0.7307 LearningRate 0.0000 Epoch: 19 Global Step: 244110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:21,318-Speed 3302.85 samples/sec Loss 0.6906 LearningRate 0.0000 Epoch: 19 Global Step: 244120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:24,447-Speed 3272.95 samples/sec Loss 0.7175 LearningRate 0.0000 Epoch: 19 Global Step: 244130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:27,589-Speed 3260.91 samples/sec Loss 0.7194 LearningRate 0.0000 Epoch: 19 Global Step: 244140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:30,730-Speed 3261.27 samples/sec Loss 0.6808 LearningRate 0.0000 Epoch: 19 Global Step: 244150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:33,824-Speed 3310.40 samples/sec Loss 0.7254 LearningRate 0.0000 Epoch: 19 Global Step: 244160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:36,936-Speed 3291.51 samples/sec Loss 0.7185 LearningRate 0.0000 Epoch: 19 Global Step: 244170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:40,050-Speed 3288.89 samples/sec Loss 0.7049 LearningRate 0.0000 Epoch: 19 Global Step: 244180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:55:43,166-Speed 3287.71 samples/sec Loss 0.7411 LearningRate 0.0000 Epoch: 19 Global Step: 244190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:46,219-Speed 3354.95 samples/sec Loss 0.7057 LearningRate 0.0000 Epoch: 19 Global Step: 244200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:49,338-Speed 3283.57 samples/sec Loss 0.7236 LearningRate 0.0000 Epoch: 19 Global Step: 244210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:52,474-Speed 3267.38 samples/sec Loss 0.7315 LearningRate 0.0000 Epoch: 19 Global Step: 244220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:55,583-Speed 3294.09 samples/sec Loss 0.7426 LearningRate 0.0000 Epoch: 19 Global Step: 244230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:55:58,645-Speed 3345.09 samples/sec Loss 0.7403 LearningRate 0.0000 Epoch: 19 Global Step: 244240 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:56:01,832-Speed 3214.33 samples/sec Loss 0.7111 LearningRate 0.0000 Epoch: 19 Global Step: 244250 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:56:04,962-Speed 3272.45 samples/sec Loss 0.7254 LearningRate 0.0000 Epoch: 19 Global Step: 244260 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:56:08,094-Speed 3270.16 samples/sec Loss 0.7390 LearningRate 0.0000 Epoch: 19 Global Step: 244270 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:56:11,215-Speed 3282.14 samples/sec Loss 0.7091 LearningRate 0.0000 Epoch: 19 Global Step: 244280 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:56:14,339-Speed 3279.60 samples/sec Loss 0.7156 LearningRate 0.0000 Epoch: 19 Global Step: 244290 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:56:17,532-Speed 3207.80 samples/sec Loss 0.7312 LearningRate 0.0000 Epoch: 19 Global Step: 244300 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:56:20,616-Speed 3321.39 samples/sec Loss 0.6985 LearningRate 0.0000 Epoch: 19 Global Step: 244310 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:56:23,750-Speed 3268.73 samples/sec Loss 0.6964 LearningRate 0.0000 Epoch: 19 Global Step: 244320 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:56:26,905-Speed 3246.11 samples/sec Loss 0.7394 LearningRate 0.0000 Epoch: 19 Global Step: 244330 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:56:30,046-Speed 3261.34 samples/sec Loss 0.6837 LearningRate 0.0000 Epoch: 19 Global Step: 244340 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:56:33,174-Speed 3274.65 samples/sec Loss 0.7124 LearningRate 0.0000 Epoch: 19 Global Step: 244350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:56:36,331-Speed 3245.32 samples/sec Loss 0.7539 LearningRate 0.0000 Epoch: 19 Global Step: 244360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:56:39,424-Speed 3311.35 samples/sec Loss 0.7119 LearningRate 0.0000 Epoch: 19 Global Step: 244370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:56:42,512-Speed 3316.54 samples/sec Loss 0.7483 LearningRate 0.0000 Epoch: 19 Global Step: 244380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:56:45,621-Speed 3297.24 samples/sec Loss 0.7475 LearningRate 0.0000 Epoch: 19 Global Step: 244390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:56:48,742-Speed 3281.54 samples/sec Loss 0.7334 LearningRate 0.0000 Epoch: 19 Global Step: 244400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:56:51,871-Speed 3273.63 samples/sec Loss 0.7210 LearningRate 0.0000 Epoch: 19 Global Step: 244410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:56:55,001-Speed 3272.59 samples/sec Loss 0.7196 LearningRate 0.0000 Epoch: 19 Global Step: 244420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:56:58,075-Speed 3331.80 samples/sec Loss 0.7092 LearningRate 0.0000 Epoch: 19 Global Step: 244430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:57:01,205-Speed 3273.05 samples/sec Loss 0.7087 LearningRate 0.0000 Epoch: 19 Global Step: 244440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:04,333-Speed 3274.84 samples/sec Loss 0.6984 LearningRate 0.0000 Epoch: 19 Global Step: 244450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:07,417-Speed 3321.33 samples/sec Loss 0.7422 LearningRate 0.0000 Epoch: 19 Global Step: 244460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:10,528-Speed 3292.16 samples/sec Loss 0.7204 LearningRate 0.0000 Epoch: 19 Global Step: 244470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:13,648-Speed 3283.54 samples/sec Loss 0.7412 LearningRate 0.0000 Epoch: 19 Global Step: 244480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:16,744-Speed 3308.82 samples/sec Loss 0.7070 LearningRate 0.0000 Epoch: 19 Global Step: 244490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:19,824-Speed 3325.55 samples/sec Loss 0.7202 LearningRate 0.0000 Epoch: 19 Global Step: 244500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:22,909-Speed 3320.29 samples/sec Loss 0.7129 LearningRate 0.0000 Epoch: 19 Global Step: 244510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:26,036-Speed 3275.89 samples/sec Loss 0.7064 LearningRate 0.0000 Epoch: 19 Global Step: 244520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:29,144-Speed 3295.38 samples/sec Loss 0.7191 LearningRate 0.0000 Epoch: 19 Global Step: 244530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:32,244-Speed 3304.96 samples/sec Loss 0.7145 LearningRate 0.0000 Epoch: 19 Global Step: 244540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:35,420-Speed 3224.85 samples/sec Loss 0.7591 LearningRate 0.0000 Epoch: 19 Global Step: 244550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:57:38,546-Speed 3277.03 samples/sec Loss 0.7256 LearningRate 0.0000 Epoch: 19 Global Step: 244560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:57:41,715-Speed 3232.40 samples/sec Loss 0.7410 LearningRate 0.0000 Epoch: 19 Global Step: 244570 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:57:44,804-Speed 3315.66 samples/sec Loss 0.7628 LearningRate 0.0000 Epoch: 19 Global Step: 244580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:57:47,945-Speed 3261.11 samples/sec Loss 0.6937 LearningRate 0.0000 Epoch: 19 Global Step: 244590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:57:51,132-Speed 3214.19 samples/sec Loss 0.7020 LearningRate 0.0000 Epoch: 19 Global Step: 244600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:57:54,322-Speed 3210.56 samples/sec Loss 0.7270 LearningRate 0.0000 Epoch: 19 Global Step: 244610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:57:57,430-Speed 3295.95 samples/sec Loss 0.7323 LearningRate 0.0000 Epoch: 19 Global Step: 244620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:58:00,503-Speed 3333.66 samples/sec Loss 0.7599 LearningRate 0.0000 Epoch: 19 Global Step: 244630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:58:03,595-Speed 3312.55 samples/sec Loss 0.7154 LearningRate 0.0000 Epoch: 19 Global Step: 244640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:58:06,791-Speed 3205.88 samples/sec Loss 0.7067 LearningRate 0.0000 Epoch: 19 Global Step: 244650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:58:09,878-Speed 3317.82 samples/sec Loss 0.7505 LearningRate 0.0000 Epoch: 19 Global Step: 244660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:58:12,972-Speed 3310.94 samples/sec Loss 0.7436 LearningRate 0.0000 Epoch: 19 Global Step: 244670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:58:16,145-Speed 3228.11 samples/sec Loss 0.7311 LearningRate 0.0000 Epoch: 19 Global Step: 244680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:58:19,316-Speed 3230.12 samples/sec Loss 0.6853 LearningRate 0.0000 Epoch: 19 Global Step: 244690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:58:22,401-Speed 3320.71 samples/sec Loss 0.7287 LearningRate 0.0000 Epoch: 19 Global Step: 244700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:58:25,585-Speed 3216.83 samples/sec Loss 0.7115 LearningRate 0.0000 Epoch: 19 Global Step: 244710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:58:28,741-Speed 3245.29 samples/sec Loss 0.7310 LearningRate 0.0000 Epoch: 19 Global Step: 244720 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:58:31,848-Speed 3296.96 samples/sec Loss 0.7525 LearningRate 0.0000 Epoch: 19 Global Step: 244730 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:58:34,930-Speed 3323.51 samples/sec Loss 0.7177 LearningRate 0.0000 Epoch: 19 Global Step: 244740 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:58:38,057-Speed 3276.58 samples/sec Loss 0.7199 LearningRate 0.0000 Epoch: 19 Global Step: 244750 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:58:41,166-Speed 3294.80 samples/sec Loss 0.6967 LearningRate 0.0000 Epoch: 19 Global Step: 244760 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:58:44,302-Speed 3265.28 samples/sec Loss 0.6879 LearningRate 0.0000 Epoch: 19 Global Step: 244770 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:58:47,426-Speed 3279.75 samples/sec Loss 0.7161 LearningRate 0.0000 Epoch: 19 Global Step: 244780 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:58:50,510-Speed 3320.49 samples/sec Loss 0.7354 LearningRate 0.0000 Epoch: 19 Global Step: 244790 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:58:53,646-Speed 3266.96 samples/sec Loss 0.7127 LearningRate 0.0000 Epoch: 19 Global Step: 244800 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:58:56,691-Speed 3363.78 samples/sec Loss 0.6852 LearningRate 0.0000 Epoch: 19 Global Step: 244810 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:58:59,749-Speed 3349.66 samples/sec Loss 0.7162 LearningRate 0.0000 Epoch: 19 Global Step: 244820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:02,865-Speed 3287.45 samples/sec Loss 0.7292 LearningRate 0.0000 Epoch: 19 Global Step: 244830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:05,960-Speed 3310.15 samples/sec Loss 0.7141 LearningRate 0.0000 Epoch: 19 Global Step: 244840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:09,074-Speed 3289.07 samples/sec Loss 0.7498 LearningRate 0.0000 Epoch: 19 Global Step: 244850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:12,170-Speed 3308.62 samples/sec Loss 0.6817 LearningRate 0.0000 Epoch: 19 Global Step: 244860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:15,309-Speed 3263.57 samples/sec Loss 0.7632 LearningRate 0.0000 Epoch: 19 Global Step: 244870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:18,459-Speed 3251.59 samples/sec Loss 0.7367 LearningRate 0.0000 Epoch: 19 Global Step: 244880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:21,535-Speed 3329.93 samples/sec Loss 0.7206 LearningRate 0.0000 Epoch: 19 Global Step: 244890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:24,641-Speed 3298.50 samples/sec Loss 0.6799 LearningRate 0.0000 Epoch: 19 Global Step: 244900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:27,761-Speed 3283.06 samples/sec Loss 0.7134 LearningRate 0.0000 Epoch: 19 Global Step: 244910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:30,889-Speed 3274.90 samples/sec Loss 0.7243 LearningRate 0.0000 Epoch: 19 Global Step: 244920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 22:59:33,949-Speed 3346.45 samples/sec Loss 0.7277 LearningRate 0.0000 Epoch: 19 Global Step: 244930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:37,041-Speed 3314.13 samples/sec Loss 0.7230 LearningRate 0.0000 Epoch: 19 Global Step: 244940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 22:59:40,071-Speed 3380.21 samples/sec Loss 0.7115 LearningRate 0.0000 Epoch: 19 Global Step: 244950 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:59:43,190-Speed 3283.61 samples/sec Loss 0.7361 LearningRate 0.0000 Epoch: 19 Global Step: 244960 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:59:46,272-Speed 3323.44 samples/sec Loss 0.7228 LearningRate 0.0000 Epoch: 19 Global Step: 244970 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:59:49,453-Speed 3220.89 samples/sec Loss 0.7412 LearningRate 0.0000 Epoch: 19 Global Step: 244980 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:59:52,547-Speed 3310.34 samples/sec Loss 0.7456 LearningRate 0.0000 Epoch: 19 Global Step: 244990 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:59:55,695-Speed 3253.98 samples/sec Loss 0.7160 LearningRate 0.0000 Epoch: 19 Global Step: 245000 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 22:59:58,776-Speed 3324.98 samples/sec Loss 0.7089 LearningRate 0.0000 Epoch: 19 Global Step: 245010 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:01,942-Speed 3234.74 samples/sec Loss 0.7098 LearningRate 0.0000 Epoch: 19 Global Step: 245020 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:05,036-Speed 3311.47 samples/sec Loss 0.7280 LearningRate 0.0000 Epoch: 19 Global Step: 245030 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:08,178-Speed 3259.58 samples/sec Loss 0.7229 LearningRate 0.0000 Epoch: 19 Global Step: 245040 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:11,296-Speed 3285.40 samples/sec Loss 0.6885 LearningRate 0.0000 Epoch: 19 Global Step: 245050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:00:14,390-Speed 3310.58 samples/sec Loss 0.6976 LearningRate 0.0000 Epoch: 19 Global Step: 245060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:00:17,469-Speed 3327.42 samples/sec Loss 0.7413 LearningRate 0.0000 Epoch: 19 Global Step: 245070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:00:20,565-Speed 3309.40 samples/sec Loss 0.7407 LearningRate 0.0000 Epoch: 19 Global Step: 245080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:00:23,708-Speed 3258.98 samples/sec Loss 0.7589 LearningRate 0.0000 Epoch: 19 Global Step: 245090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:00:26,799-Speed 3313.54 samples/sec Loss 0.7389 LearningRate 0.0000 Epoch: 19 Global Step: 245100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:00:29,918-Speed 3284.31 samples/sec Loss 0.7147 LearningRate 0.0000 Epoch: 19 Global Step: 245110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:00:32,981-Speed 3344.18 samples/sec Loss 0.7088 LearningRate 0.0000 Epoch: 19 Global Step: 245120 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:36,114-Speed 3269.42 samples/sec Loss 0.7302 LearningRate 0.0000 Epoch: 19 Global Step: 245130 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:39,223-Speed 3295.12 samples/sec Loss 0.7013 LearningRate 0.0000 Epoch: 19 Global Step: 245140 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:42,370-Speed 3253.89 samples/sec Loss 0.7338 LearningRate 0.0000 Epoch: 19 Global Step: 245150 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:45,446-Speed 3330.83 samples/sec Loss 0.7135 LearningRate 0.0000 Epoch: 19 Global Step: 245160 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:48,523-Speed 3329.07 samples/sec Loss 0.7142 LearningRate 0.0000 Epoch: 19 Global Step: 245170 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:51,671-Speed 3253.16 samples/sec Loss 0.7284 LearningRate 0.0000 Epoch: 19 Global Step: 245180 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:54,753-Speed 3324.08 samples/sec Loss 0.7585 LearningRate 0.0000 Epoch: 19 Global Step: 245190 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:00:57,831-Speed 3327.70 samples/sec Loss 0.7315 LearningRate 0.0000 Epoch: 19 Global Step: 245200 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:01:00,897-Speed 3341.11 samples/sec Loss 0.7325 LearningRate 0.0000 Epoch: 19 Global Step: 245210 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:01:04,007-Speed 3293.78 samples/sec Loss 0.6981 LearningRate 0.0000 Epoch: 19 Global Step: 245220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:07,112-Speed 3298.38 samples/sec Loss 0.7224 LearningRate 0.0000 Epoch: 19 Global Step: 245230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:10,158-Speed 3363.02 samples/sec Loss 0.7016 LearningRate 0.0000 Epoch: 19 Global Step: 245240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:13,310-Speed 3249.47 samples/sec Loss 0.7045 LearningRate 0.0000 Epoch: 19 Global Step: 245250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:16,420-Speed 3294.11 samples/sec Loss 0.7073 LearningRate 0.0000 Epoch: 19 Global Step: 245260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:19,499-Speed 3326.11 samples/sec Loss 0.7005 LearningRate 0.0000 Epoch: 19 Global Step: 245270 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:22,598-Speed 3305.91 samples/sec Loss 0.7211 LearningRate 0.0000 Epoch: 19 Global Step: 245280 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:25,666-Speed 3338.80 samples/sec Loss 0.7274 LearningRate 0.0000 Epoch: 19 Global Step: 245290 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:28,762-Speed 3308.22 samples/sec Loss 0.7032 LearningRate 0.0000 Epoch: 19 Global Step: 245300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:31,815-Speed 3354.86 samples/sec Loss 0.6988 LearningRate 0.0000 Epoch: 19 Global Step: 245310 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:34,879-Speed 3343.56 samples/sec Loss 0.7291 LearningRate 0.0000 Epoch: 19 Global Step: 245320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:01:37,969-Speed 3314.40 samples/sec Loss 0.6804 LearningRate 0.0000 Epoch: 19 Global Step: 245330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:01:41,102-Speed 3270.41 samples/sec Loss 0.6901 LearningRate 0.0000 Epoch: 19 Global Step: 245340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:01:44,179-Speed 3328.56 samples/sec Loss 0.7233 LearningRate 0.0000 Epoch: 19 Global Step: 245350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:01:47,270-Speed 3314.50 samples/sec Loss 0.7131 LearningRate 0.0000 Epoch: 19 Global Step: 245360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:50,372-Speed 3300.95 samples/sec Loss 0.7254 LearningRate 0.0000 Epoch: 19 Global Step: 245370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:53,430-Speed 3349.92 samples/sec Loss 0.7149 LearningRate 0.0000 Epoch: 19 Global Step: 245380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:56,513-Speed 3322.78 samples/sec Loss 0.7410 LearningRate 0.0000 Epoch: 19 Global Step: 245390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:01:59,603-Speed 3314.82 samples/sec Loss 0.7581 LearningRate 0.0000 Epoch: 19 Global Step: 245400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:02,724-Speed 3281.63 samples/sec Loss 0.7127 LearningRate 0.0000 Epoch: 19 Global Step: 245410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:05,831-Speed 3297.51 samples/sec Loss 0.7527 LearningRate 0.0000 Epoch: 19 Global Step: 245420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:08,874-Speed 3365.64 samples/sec Loss 0.6919 LearningRate 0.0000 Epoch: 19 Global Step: 245430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:11,948-Speed 3332.66 samples/sec Loss 0.6936 LearningRate 0.0000 Epoch: 19 Global Step: 245440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:15,115-Speed 3234.76 samples/sec Loss 0.7125 LearningRate 0.0000 Epoch: 19 Global Step: 245450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:18,240-Speed 3277.39 samples/sec Loss 0.7192 LearningRate 0.0000 Epoch: 19 Global Step: 245460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:21,326-Speed 3319.06 samples/sec Loss 0.7165 LearningRate 0.0000 Epoch: 19 Global Step: 245470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:24,429-Speed 3301.13 samples/sec Loss 0.7381 LearningRate 0.0000 Epoch: 19 Global Step: 245480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:27,598-Speed 3231.87 samples/sec Loss 0.6804 LearningRate 0.0000 Epoch: 19 Global Step: 245490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:30,705-Speed 3297.14 samples/sec Loss 0.7054 LearningRate 0.0000 Epoch: 19 Global Step: 245500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:33,767-Speed 3345.21 samples/sec Loss 0.7437 LearningRate 0.0000 Epoch: 19 Global Step: 245510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:36,894-Speed 3275.10 samples/sec Loss 0.6771 LearningRate 0.0000 Epoch: 19 Global Step: 245520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:40,118-Speed 3177.42 samples/sec Loss 0.7025 LearningRate 0.0000 Epoch: 19 Global Step: 245530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:43,262-Speed 3258.07 samples/sec Loss 0.7242 LearningRate 0.0000 Epoch: 19 Global Step: 245540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:46,348-Speed 3319.65 samples/sec Loss 0.7520 LearningRate 0.0000 Epoch: 19 Global Step: 245550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:49,514-Speed 3235.76 samples/sec Loss 0.7284 LearningRate 0.0000 Epoch: 19 Global Step: 245560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:02:52,619-Speed 3299.15 samples/sec Loss 0.7266 LearningRate 0.0000 Epoch: 19 Global Step: 245570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:02:55,725-Speed 3297.64 samples/sec Loss 0.7335 LearningRate 0.0000 Epoch: 19 Global Step: 245580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:02:58,859-Speed 3268.35 samples/sec Loss 0.7420 LearningRate 0.0000 Epoch: 19 Global Step: 245590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:01,969-Speed 3293.43 samples/sec Loss 0.6968 LearningRate 0.0000 Epoch: 19 Global Step: 245600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:05,083-Speed 3289.77 samples/sec Loss 0.7297 LearningRate 0.0000 Epoch: 19 Global Step: 245610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:08,207-Speed 3278.87 samples/sec Loss 0.7401 LearningRate 0.0000 Epoch: 19 Global Step: 245620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:11,295-Speed 3316.42 samples/sec Loss 0.7229 LearningRate 0.0000 Epoch: 19 Global Step: 245630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:14,453-Speed 3244.06 samples/sec Loss 0.7439 LearningRate 0.0000 Epoch: 19 Global Step: 245640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:17,541-Speed 3316.71 samples/sec Loss 0.7519 LearningRate 0.0000 Epoch: 19 Global Step: 245650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:20,657-Speed 3288.09 samples/sec Loss 0.7274 LearningRate 0.0000 Epoch: 19 Global Step: 245660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:23,804-Speed 3254.20 samples/sec Loss 0.7091 LearningRate 0.0000 Epoch: 19 Global Step: 245670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:26,927-Speed 3280.17 samples/sec Loss 0.7384 LearningRate 0.0000 Epoch: 19 Global Step: 245680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:03:30,067-Speed 3262.64 samples/sec Loss 0.7248 LearningRate 0.0000 Epoch: 19 Global Step: 245690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:03:33,174-Speed 3296.63 samples/sec Loss 0.7251 LearningRate 0.0000 Epoch: 19 Global Step: 245700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:03:36,317-Speed 3259.56 samples/sec Loss 0.7296 LearningRate 0.0000 Epoch: 19 Global Step: 245710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:39,462-Speed 3256.77 samples/sec Loss 0.6897 LearningRate 0.0000 Epoch: 19 Global Step: 245720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:42,567-Speed 3297.86 samples/sec Loss 0.7309 LearningRate 0.0000 Epoch: 19 Global Step: 245730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:45,696-Speed 3274.34 samples/sec Loss 0.6925 LearningRate 0.0000 Epoch: 19 Global Step: 245740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:48,818-Speed 3281.45 samples/sec Loss 0.7153 LearningRate 0.0000 Epoch: 19 Global Step: 245750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:51,925-Speed 3296.63 samples/sec Loss 0.7226 LearningRate 0.0000 Epoch: 19 Global Step: 245760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:55,032-Speed 3296.43 samples/sec Loss 0.7627 LearningRate 0.0000 Epoch: 19 Global Step: 245770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:03:58,102-Speed 3336.70 samples/sec Loss 0.7235 LearningRate 0.0000 Epoch: 19 Global Step: 245780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:01,249-Speed 3254.26 samples/sec Loss 0.7174 LearningRate 0.0000 Epoch: 19 Global Step: 245790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:04,372-Speed 3280.33 samples/sec Loss 0.7069 LearningRate 0.0000 Epoch: 19 Global Step: 245800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:07,507-Speed 3267.76 samples/sec Loss 0.7399 LearningRate 0.0000 Epoch: 19 Global Step: 245810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:04:10,605-Speed 3306.45 samples/sec Loss 0.7303 LearningRate 0.0000 Epoch: 19 Global Step: 245820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:04:13,716-Speed 3291.75 samples/sec Loss 0.7251 LearningRate 0.0000 Epoch: 19 Global Step: 245830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:04:16,820-Speed 3300.36 samples/sec Loss 0.7404 LearningRate 0.0000 Epoch: 19 Global Step: 245840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:04:19,924-Speed 3300.06 samples/sec Loss 0.7226 LearningRate 0.0000 Epoch: 19 Global Step: 245850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:04:23,023-Speed 3305.55 samples/sec Loss 0.7353 LearningRate 0.0000 Epoch: 19 Global Step: 245860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:04:26,124-Speed 3303.03 samples/sec Loss 0.7205 LearningRate 0.0000 Epoch: 19 Global Step: 245870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:29,265-Speed 3261.37 samples/sec Loss 0.7207 LearningRate 0.0000 Epoch: 19 Global Step: 245880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:32,375-Speed 3293.00 samples/sec Loss 0.7266 LearningRate 0.0000 Epoch: 19 Global Step: 245890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:35,499-Speed 3278.84 samples/sec Loss 0.7283 LearningRate 0.0000 Epoch: 19 Global Step: 245900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:38,596-Speed 3307.49 samples/sec Loss 0.6837 LearningRate 0.0000 Epoch: 19 Global Step: 245910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:41,836-Speed 3161.22 samples/sec Loss 0.7228 LearningRate 0.0000 Epoch: 19 Global Step: 245920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:44,964-Speed 3274.99 samples/sec Loss 0.7217 LearningRate 0.0000 Epoch: 19 Global Step: 245930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:48,105-Speed 3261.90 samples/sec Loss 0.7182 LearningRate 0.0000 Epoch: 19 Global Step: 245940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:51,185-Speed 3325.33 samples/sec Loss 0.7116 LearningRate 0.0000 Epoch: 19 Global Step: 245950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:04:54,285-Speed 3303.58 samples/sec Loss 0.7224 LearningRate 0.0000 Epoch: 19 Global Step: 245960 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:04:57,338-Speed 3356.01 samples/sec Loss 0.7052 LearningRate 0.0000 Epoch: 19 Global Step: 245970 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:00,450-Speed 3290.86 samples/sec Loss 0.7088 LearningRate 0.0000 Epoch: 19 Global Step: 245980 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:03,543-Speed 3311.97 samples/sec Loss 0.7320 LearningRate 0.0000 Epoch: 19 Global Step: 245990 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:06,643-Speed 3303.78 samples/sec Loss 0.7187 LearningRate 0.0000 Epoch: 19 Global Step: 246000 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:09,690-Speed 3361.84 samples/sec Loss 0.7231 LearningRate 0.0000 Epoch: 19 Global Step: 246010 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:12,829-Speed 3263.43 samples/sec Loss 0.7441 LearningRate 0.0000 Epoch: 19 Global Step: 246020 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:15,906-Speed 3328.75 samples/sec Loss 0.7155 LearningRate 0.0000 Epoch: 19 Global Step: 246030 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:19,045-Speed 3263.66 samples/sec Loss 0.7065 LearningRate 0.0000 Epoch: 19 Global Step: 246040 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:22,099-Speed 3354.54 samples/sec Loss 0.7292 LearningRate 0.0000 Epoch: 19 Global Step: 246050 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:25,181-Speed 3323.37 samples/sec Loss 0.7295 LearningRate 0.0000 Epoch: 19 Global Step: 246060 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:28,303-Speed 3280.90 samples/sec Loss 0.7472 LearningRate 0.0000 Epoch: 19 Global Step: 246070 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:31,421-Speed 3285.22 samples/sec Loss 0.7547 LearningRate 0.0000 Epoch: 19 Global Step: 246080 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:34,502-Speed 3324.37 samples/sec Loss 0.7210 LearningRate 0.0000 Epoch: 19 Global Step: 246090 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:37,653-Speed 3250.44 samples/sec Loss 0.7151 LearningRate 0.0000 Epoch: 19 Global Step: 246100 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:05:40,752-Speed 3305.91 samples/sec Loss 0.7290 LearningRate 0.0000 Epoch: 19 Global Step: 246110 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-27 23:05:43,864-Speed 3291.26 samples/sec Loss 0.7473 LearningRate 0.0000 Epoch: 19 Global Step: 246120 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-27 23:05:47,049-Speed 3215.72 samples/sec Loss 0.7198 LearningRate 0.0000 Epoch: 19 Global Step: 246130 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-27 23:05:50,135-Speed 3319.88 samples/sec Loss 0.7088 LearningRate 0.0000 Epoch: 19 Global Step: 246140 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-27 23:05:53,225-Speed 3315.31 samples/sec Loss 0.7190 LearningRate 0.0000 Epoch: 19 Global Step: 246150 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-27 23:05:56,312-Speed 3317.57 samples/sec Loss 0.6849 LearningRate 0.0000 Epoch: 19 Global Step: 246160 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-27 23:05:59,371-Speed 3348.42 samples/sec Loss 0.7358 LearningRate 0.0000 Epoch: 19 Global Step: 246170 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-27 23:06:02,521-Speed 3251.67 samples/sec Loss 0.7050 LearningRate 0.0000 Epoch: 19 Global Step: 246180 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-27 23:06:05,615-Speed 3310.97 samples/sec Loss 0.7157 LearningRate 0.0000 Epoch: 19 Global Step: 246190 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-27 23:06:08,718-Speed 3301.41 samples/sec Loss 0.7508 LearningRate 0.0000 Epoch: 19 Global Step: 246200 Fp16 Grad Scale: 4096 Required: 0 hours Training: 2022-04-27 23:06:11,897-Speed 3221.30 samples/sec Loss 0.6828 LearningRate 0.0000 Epoch: 19 Global Step: 246210 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:06:14,967-Speed 3336.37 samples/sec Loss 0.7379 LearningRate 0.0000 Epoch: 19 Global Step: 246220 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:06:18,029-Speed 3346.11 samples/sec Loss 0.7348 LearningRate 0.0000 Epoch: 19 Global Step: 246230 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:06:21,086-Speed 3351.10 samples/sec Loss 0.7411 LearningRate 0.0000 Epoch: 19 Global Step: 246240 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:06:24,157-Speed 3335.75 samples/sec Loss 0.7112 LearningRate 0.0000 Epoch: 19 Global Step: 246250 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:06:27,212-Speed 3353.32 samples/sec Loss 0.7250 LearningRate 0.0000 Epoch: 19 Global Step: 246260 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:06:30,271-Speed 3348.35 samples/sec Loss 0.7279 LearningRate 0.0000 Epoch: 19 Global Step: 246270 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:06:33,341-Speed 3336.65 samples/sec Loss 0.7013 LearningRate 0.0000 Epoch: 19 Global Step: 246280 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:06:36,476-Speed 3267.67 samples/sec Loss 0.6702 LearningRate 0.0000 Epoch: 19 Global Step: 246290 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:06:39,614-Speed 3263.90 samples/sec Loss 0.7259 LearningRate 0.0000 Epoch: 19 Global Step: 246300 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:06:42,795-Speed 3220.58 samples/sec Loss 0.7422 LearningRate 0.0000 Epoch: 19 Global Step: 246310 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:06:45,880-Speed 3319.64 samples/sec Loss 0.7300 LearningRate 0.0000 Epoch: 19 Global Step: 246320 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:06:48,940-Speed 3347.60 samples/sec Loss 0.6996 LearningRate 0.0000 Epoch: 19 Global Step: 246330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:06:52,016-Speed 3330.71 samples/sec Loss 0.6883 LearningRate 0.0000 Epoch: 19 Global Step: 246340 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:06:55,177-Speed 3239.41 samples/sec Loss 0.7368 LearningRate 0.0000 Epoch: 19 Global Step: 246350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:06:58,269-Speed 3313.23 samples/sec Loss 0.7398 LearningRate 0.0000 Epoch: 19 Global Step: 246360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:01,414-Speed 3257.67 samples/sec Loss 0.7568 LearningRate 0.0000 Epoch: 19 Global Step: 246370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:04,487-Speed 3333.09 samples/sec Loss 0.6689 LearningRate 0.0000 Epoch: 19 Global Step: 246380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:07,590-Speed 3301.45 samples/sec Loss 0.7242 LearningRate 0.0000 Epoch: 19 Global Step: 246390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:10,704-Speed 3289.45 samples/sec Loss 0.6967 LearningRate 0.0000 Epoch: 19 Global Step: 246400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:13,817-Speed 3290.63 samples/sec Loss 0.7283 LearningRate 0.0000 Epoch: 19 Global Step: 246410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:07:16,898-Speed 3324.08 samples/sec Loss 0.7327 LearningRate 0.0000 Epoch: 19 Global Step: 246420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:07:19,978-Speed 3325.63 samples/sec Loss 0.7553 LearningRate 0.0000 Epoch: 19 Global Step: 246430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:07:23,053-Speed 3331.60 samples/sec Loss 0.7195 LearningRate 0.0000 Epoch: 19 Global Step: 246440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:07:26,105-Speed 3356.31 samples/sec Loss 0.7420 LearningRate 0.0000 Epoch: 19 Global Step: 246450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:29,183-Speed 3327.48 samples/sec Loss 0.7263 LearningRate 0.0000 Epoch: 19 Global Step: 246460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:32,256-Speed 3333.60 samples/sec Loss 0.7311 LearningRate 0.0000 Epoch: 19 Global Step: 246470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:35,350-Speed 3310.33 samples/sec Loss 0.7048 LearningRate 0.0000 Epoch: 19 Global Step: 246480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:38,407-Speed 3350.76 samples/sec Loss 0.7199 LearningRate 0.0000 Epoch: 19 Global Step: 246490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:41,517-Speed 3293.47 samples/sec Loss 0.7243 LearningRate 0.0000 Epoch: 19 Global Step: 246500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:44,577-Speed 3347.24 samples/sec Loss 0.7294 LearningRate 0.0000 Epoch: 19 Global Step: 246510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:47,712-Speed 3268.25 samples/sec Loss 0.7338 LearningRate 0.0000 Epoch: 19 Global Step: 246520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:50,961-Speed 3152.09 samples/sec Loss 0.7114 LearningRate 0.0000 Epoch: 19 Global Step: 246530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:54,015-Speed 3353.55 samples/sec Loss 0.7159 LearningRate 0.0000 Epoch: 19 Global Step: 246540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:07:57,092-Speed 3329.56 samples/sec Loss 0.7556 LearningRate 0.0000 Epoch: 19 Global Step: 246550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:00,185-Speed 3311.94 samples/sec Loss 0.7334 LearningRate 0.0000 Epoch: 19 Global Step: 246560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:03,340-Speed 3246.01 samples/sec Loss 0.6904 LearningRate 0.0000 Epoch: 19 Global Step: 246570 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:06,519-Speed 3222.64 samples/sec Loss 0.7434 LearningRate 0.0000 Epoch: 19 Global Step: 246580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:09,615-Speed 3308.31 samples/sec Loss 0.7113 LearningRate 0.0000 Epoch: 19 Global Step: 246590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:12,669-Speed 3353.78 samples/sec Loss 0.7177 LearningRate 0.0000 Epoch: 19 Global Step: 246600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:15,771-Speed 3302.68 samples/sec Loss 0.7331 LearningRate 0.0000 Epoch: 19 Global Step: 246610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:18,908-Speed 3265.39 samples/sec Loss 0.7397 LearningRate 0.0000 Epoch: 19 Global Step: 246620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:21,961-Speed 3355.16 samples/sec Loss 0.7225 LearningRate 0.0000 Epoch: 19 Global Step: 246630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:25,035-Speed 3331.45 samples/sec Loss 0.7206 LearningRate 0.0000 Epoch: 19 Global Step: 246640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:28,223-Speed 3213.45 samples/sec Loss 0.7287 LearningRate 0.0000 Epoch: 19 Global Step: 246650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:08:31,331-Speed 3295.52 samples/sec Loss 0.7381 LearningRate 0.0000 Epoch: 19 Global Step: 246660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:34,390-Speed 3349.04 samples/sec Loss 0.7248 LearningRate 0.0000 Epoch: 19 Global Step: 246670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:37,473-Speed 3322.66 samples/sec Loss 0.7049 LearningRate 0.0000 Epoch: 19 Global Step: 246680 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:40,575-Speed 3301.58 samples/sec Loss 0.7301 LearningRate 0.0000 Epoch: 19 Global Step: 246690 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:43,669-Speed 3311.11 samples/sec Loss 0.7612 LearningRate 0.0000 Epoch: 19 Global Step: 246700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:46,742-Speed 3333.22 samples/sec Loss 0.7212 LearningRate 0.0000 Epoch: 19 Global Step: 246710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:49,862-Speed 3282.93 samples/sec Loss 0.7002 LearningRate 0.0000 Epoch: 19 Global Step: 246720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:52,964-Speed 3302.73 samples/sec Loss 0.7300 LearningRate 0.0000 Epoch: 19 Global Step: 246730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:56,014-Speed 3358.10 samples/sec Loss 0.7614 LearningRate 0.0000 Epoch: 19 Global Step: 246740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:08:59,083-Speed 3337.87 samples/sec Loss 0.7480 LearningRate 0.0000 Epoch: 19 Global Step: 246750 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:09:02,227-Speed 3257.45 samples/sec Loss 0.6973 LearningRate 0.0000 Epoch: 19 Global Step: 246760 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:09:05,322-Speed 3310.14 samples/sec Loss 0.7293 LearningRate 0.0000 Epoch: 19 Global Step: 246770 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:09:08,372-Speed 3358.83 samples/sec Loss 0.7279 LearningRate 0.0000 Epoch: 19 Global Step: 246780 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:09:11,476-Speed 3300.17 samples/sec Loss 0.7167 LearningRate 0.0000 Epoch: 19 Global Step: 246790 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:09:14,668-Speed 3209.02 samples/sec Loss 0.7151 LearningRate 0.0000 Epoch: 19 Global Step: 246800 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:09:17,754-Speed 3319.20 samples/sec Loss 0.7570 LearningRate 0.0000 Epoch: 19 Global Step: 246810 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:09:20,825-Speed 3334.90 samples/sec Loss 0.7104 LearningRate 0.0000 Epoch: 19 Global Step: 246820 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:09:23,909-Speed 3321.43 samples/sec Loss 0.7591 LearningRate 0.0000 Epoch: 19 Global Step: 246830 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:09:27,002-Speed 3312.42 samples/sec Loss 0.7231 LearningRate 0.0000 Epoch: 19 Global Step: 246840 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:09:30,144-Speed 3260.01 samples/sec Loss 0.7068 LearningRate 0.0000 Epoch: 19 Global Step: 246850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:09:33,226-Speed 3323.22 samples/sec Loss 0.7144 LearningRate 0.0000 Epoch: 19 Global Step: 246860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:09:36,296-Speed 3336.61 samples/sec Loss 0.7183 LearningRate 0.0000 Epoch: 19 Global Step: 246870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:09:39,407-Speed 3292.60 samples/sec Loss 0.7069 LearningRate 0.0000 Epoch: 19 Global Step: 246880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:09:42,565-Speed 3243.91 samples/sec Loss 0.7221 LearningRate 0.0000 Epoch: 19 Global Step: 246890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:09:45,685-Speed 3283.40 samples/sec Loss 0.7095 LearningRate 0.0000 Epoch: 19 Global Step: 246900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:09:48,812-Speed 3274.68 samples/sec Loss 0.7192 LearningRate 0.0000 Epoch: 19 Global Step: 246910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:09:51,885-Speed 3333.40 samples/sec Loss 0.7145 LearningRate 0.0000 Epoch: 19 Global Step: 246920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:09:54,943-Speed 3350.33 samples/sec Loss 0.7074 LearningRate 0.0000 Epoch: 19 Global Step: 246930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:09:57,997-Speed 3354.21 samples/sec Loss 0.7147 LearningRate 0.0000 Epoch: 19 Global Step: 246940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:01,055-Speed 3348.98 samples/sec Loss 0.7223 LearningRate 0.0000 Epoch: 19 Global Step: 246950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:04,131-Speed 3329.78 samples/sec Loss 0.7051 LearningRate 0.0000 Epoch: 19 Global Step: 246960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:07,203-Speed 3334.45 samples/sec Loss 0.7366 LearningRate 0.0000 Epoch: 19 Global Step: 246970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:10,254-Speed 3357.23 samples/sec Loss 0.7215 LearningRate 0.0000 Epoch: 19 Global Step: 246980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:13,342-Speed 3317.32 samples/sec Loss 0.7050 LearningRate 0.0000 Epoch: 19 Global Step: 246990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:16,551-Speed 3192.62 samples/sec Loss 0.7240 LearningRate 0.0000 Epoch: 19 Global Step: 247000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:19,682-Speed 3271.14 samples/sec Loss 0.6971 LearningRate 0.0000 Epoch: 19 Global Step: 247010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:22,772-Speed 3314.33 samples/sec Loss 0.7340 LearningRate 0.0000 Epoch: 19 Global Step: 247020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:25,852-Speed 3325.49 samples/sec Loss 0.6943 LearningRate 0.0000 Epoch: 19 Global Step: 247030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:28,970-Speed 3285.58 samples/sec Loss 0.7484 LearningRate 0.0000 Epoch: 19 Global Step: 247040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:32,086-Speed 3287.36 samples/sec Loss 0.7317 LearningRate 0.0000 Epoch: 19 Global Step: 247050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:10:35,141-Speed 3352.91 samples/sec Loss 0.7499 LearningRate 0.0000 Epoch: 19 Global Step: 247060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:38,220-Speed 3327.84 samples/sec Loss 0.7368 LearningRate 0.0000 Epoch: 19 Global Step: 247070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:41,393-Speed 3227.88 samples/sec Loss 0.7182 LearningRate 0.0000 Epoch: 19 Global Step: 247080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:44,504-Speed 3292.75 samples/sec Loss 0.7100 LearningRate 0.0000 Epoch: 19 Global Step: 247090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:47,623-Speed 3284.61 samples/sec Loss 0.7067 LearningRate 0.0000 Epoch: 19 Global Step: 247100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:50,733-Speed 3293.00 samples/sec Loss 0.7058 LearningRate 0.0000 Epoch: 19 Global Step: 247110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:53,814-Speed 3325.05 samples/sec Loss 0.7244 LearningRate 0.0000 Epoch: 19 Global Step: 247120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:10:56,962-Speed 3254.13 samples/sec Loss 0.7162 LearningRate 0.0000 Epoch: 19 Global Step: 247130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:00,029-Speed 3339.26 samples/sec Loss 0.7645 LearningRate 0.0000 Epoch: 19 Global Step: 247140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:03,085-Speed 3352.77 samples/sec Loss 0.6921 LearningRate 0.0000 Epoch: 19 Global Step: 247150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:06,206-Speed 3281.73 samples/sec Loss 0.7018 LearningRate 0.0000 Epoch: 19 Global Step: 247160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:11:09,270-Speed 3342.88 samples/sec Loss 0.7206 LearningRate 0.0000 Epoch: 19 Global Step: 247170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:12,409-Speed 3263.10 samples/sec Loss 0.7265 LearningRate 0.0000 Epoch: 19 Global Step: 247180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:15,554-Speed 3257.23 samples/sec Loss 0.7405 LearningRate 0.0000 Epoch: 19 Global Step: 247190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:18,693-Speed 3262.93 samples/sec Loss 0.6743 LearningRate 0.0000 Epoch: 19 Global Step: 247200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:21,772-Speed 3327.33 samples/sec Loss 0.7095 LearningRate 0.0000 Epoch: 19 Global Step: 247210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:24,844-Speed 3333.67 samples/sec Loss 0.7184 LearningRate 0.0000 Epoch: 19 Global Step: 247220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:27,981-Speed 3265.41 samples/sec Loss 0.6944 LearningRate 0.0000 Epoch: 19 Global Step: 247230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:31,091-Speed 3293.44 samples/sec Loss 0.7183 LearningRate 0.0000 Epoch: 19 Global Step: 247240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:34,153-Speed 3345.56 samples/sec Loss 0.7336 LearningRate 0.0000 Epoch: 19 Global Step: 247250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:37,232-Speed 3327.52 samples/sec Loss 0.7162 LearningRate 0.0000 Epoch: 19 Global Step: 247260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:11:40,311-Speed 3326.01 samples/sec Loss 0.7165 LearningRate 0.0000 Epoch: 19 Global Step: 247270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:11:43,378-Speed 3340.09 samples/sec Loss 0.7185 LearningRate 0.0000 Epoch: 19 Global Step: 247280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:11:46,428-Speed 3359.39 samples/sec Loss 0.7102 LearningRate 0.0000 Epoch: 19 Global Step: 247290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:11:49,496-Speed 3338.66 samples/sec Loss 0.7354 LearningRate 0.0000 Epoch: 19 Global Step: 247300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:11:52,581-Speed 3320.08 samples/sec Loss 0.6923 LearningRate 0.0000 Epoch: 19 Global Step: 247310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:11:55,663-Speed 3323.39 samples/sec Loss 0.7314 LearningRate 0.0000 Epoch: 19 Global Step: 247320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:11:58,785-Speed 3281.37 samples/sec Loss 0.7132 LearningRate 0.0000 Epoch: 19 Global Step: 247330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:12:01,943-Speed 3243.36 samples/sec Loss 0.7010 LearningRate 0.0000 Epoch: 19 Global Step: 247340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:12:05,087-Speed 3258.42 samples/sec Loss 0.7221 LearningRate 0.0000 Epoch: 19 Global Step: 247350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:12:08,178-Speed 3314.43 samples/sec Loss 0.7227 LearningRate 0.0000 Epoch: 19 Global Step: 247360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:11,288-Speed 3292.67 samples/sec Loss 0.7430 LearningRate 0.0000 Epoch: 19 Global Step: 247370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:14,491-Speed 3197.66 samples/sec Loss 0.7098 LearningRate 0.0000 Epoch: 19 Global Step: 247380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:17,626-Speed 3268.01 samples/sec Loss 0.6851 LearningRate 0.0000 Epoch: 19 Global Step: 247390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:20,725-Speed 3305.56 samples/sec Loss 0.7157 LearningRate 0.0000 Epoch: 19 Global Step: 247400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:23,838-Speed 3289.66 samples/sec Loss 0.7410 LearningRate 0.0000 Epoch: 19 Global Step: 247410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:26,971-Speed 3270.50 samples/sec Loss 0.7689 LearningRate 0.0000 Epoch: 19 Global Step: 247420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:30,105-Speed 3267.83 samples/sec Loss 0.7168 LearningRate 0.0000 Epoch: 19 Global Step: 247430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:33,221-Speed 3288.08 samples/sec Loss 0.7157 LearningRate 0.0000 Epoch: 19 Global Step: 247440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:36,291-Speed 3336.47 samples/sec Loss 0.6811 LearningRate 0.0000 Epoch: 19 Global Step: 247450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:39,357-Speed 3341.41 samples/sec Loss 0.7375 LearningRate 0.0000 Epoch: 19 Global Step: 247460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:12:42,475-Speed 3284.79 samples/sec Loss 0.7298 LearningRate 0.0000 Epoch: 19 Global Step: 247470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:12:45,547-Speed 3334.05 samples/sec Loss 0.6908 LearningRate 0.0000 Epoch: 19 Global Step: 247480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:12:48,673-Speed 3277.41 samples/sec Loss 0.7138 LearningRate 0.0000 Epoch: 19 Global Step: 247490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:12:51,806-Speed 3269.53 samples/sec Loss 0.7858 LearningRate 0.0000 Epoch: 19 Global Step: 247500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:54,986-Speed 3220.95 samples/sec Loss 0.7094 LearningRate 0.0000 Epoch: 19 Global Step: 247510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:12:58,063-Speed 3329.25 samples/sec Loss 0.7028 LearningRate 0.0000 Epoch: 19 Global Step: 247520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:13:01,189-Speed 3276.24 samples/sec Loss 0.6918 LearningRate 0.0000 Epoch: 19 Global Step: 247530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:13:04,291-Speed 3302.32 samples/sec Loss 0.7129 LearningRate 0.0000 Epoch: 19 Global Step: 247540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:13:07,456-Speed 3237.14 samples/sec Loss 0.6919 LearningRate 0.0000 Epoch: 19 Global Step: 247550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:13:10,549-Speed 3311.70 samples/sec Loss 0.7357 LearningRate 0.0000 Epoch: 19 Global Step: 247560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:13:13,639-Speed 3314.23 samples/sec Loss 0.6875 LearningRate 0.0000 Epoch: 19 Global Step: 247570 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:13:16,743-Speed 3301.01 samples/sec Loss 0.6906 LearningRate 0.0000 Epoch: 19 Global Step: 247580 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:13:19,858-Speed 3287.97 samples/sec Loss 0.7195 LearningRate 0.0000 Epoch: 19 Global Step: 247590 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:13:22,955-Speed 3307.24 samples/sec Loss 0.7122 LearningRate 0.0000 Epoch: 19 Global Step: 247600 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:13:26,072-Speed 3286.28 samples/sec Loss 0.7269 LearningRate 0.0000 Epoch: 19 Global Step: 247610 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:13:29,300-Speed 3172.92 samples/sec Loss 0.7085 LearningRate 0.0000 Epoch: 19 Global Step: 247620 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:13:32,442-Speed 3260.44 samples/sec Loss 0.7040 LearningRate 0.0000 Epoch: 19 Global Step: 247630 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:13:35,525-Speed 3322.10 samples/sec Loss 0.7452 LearningRate 0.0000 Epoch: 19 Global Step: 247640 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:13:38,624-Speed 3304.99 samples/sec Loss 0.7201 LearningRate 0.0000 Epoch: 19 Global Step: 247650 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:13:41,726-Speed 3302.51 samples/sec Loss 0.7084 LearningRate 0.0000 Epoch: 19 Global Step: 247660 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:13:44,832-Speed 3297.91 samples/sec Loss 0.6995 LearningRate 0.0000 Epoch: 19 Global Step: 247670 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:13:47,956-Speed 3279.02 samples/sec Loss 0.7136 LearningRate 0.0000 Epoch: 19 Global Step: 247680 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:13:51,071-Speed 3287.69 samples/sec Loss 0.6486 LearningRate 0.0000 Epoch: 19 Global Step: 247690 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:13:54,192-Speed 3281.95 samples/sec Loss 0.7327 LearningRate 0.0000 Epoch: 19 Global Step: 247700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:13:57,283-Speed 3315.11 samples/sec Loss 0.7312 LearningRate 0.0000 Epoch: 19 Global Step: 247710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:14:00,343-Speed 3346.95 samples/sec Loss 0.7363 LearningRate 0.0000 Epoch: 19 Global Step: 247720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:14:03,503-Speed 3241.27 samples/sec Loss 0.7277 LearningRate 0.0000 Epoch: 19 Global Step: 247730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:14:06,644-Speed 3261.13 samples/sec Loss 0.7325 LearningRate 0.0000 Epoch: 19 Global Step: 247740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:14:09,718-Speed 3332.68 samples/sec Loss 0.7283 LearningRate 0.0000 Epoch: 19 Global Step: 247750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:14:12,863-Speed 3256.46 samples/sec Loss 0.6776 LearningRate 0.0000 Epoch: 19 Global Step: 247760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:14:15,968-Speed 3299.59 samples/sec Loss 0.7321 LearningRate 0.0000 Epoch: 19 Global Step: 247770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:14:19,056-Speed 3317.14 samples/sec Loss 0.7363 LearningRate 0.0000 Epoch: 19 Global Step: 247780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:14:22,179-Speed 3279.83 samples/sec Loss 0.7386 LearningRate 0.0000 Epoch: 19 Global Step: 247790 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:14:25,314-Speed 3267.13 samples/sec Loss 0.7262 LearningRate 0.0000 Epoch: 19 Global Step: 247800 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:14:28,453-Speed 3263.06 samples/sec Loss 0.7281 LearningRate 0.0000 Epoch: 19 Global Step: 247810 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:14:31,585-Speed 3270.71 samples/sec Loss 0.7191 LearningRate 0.0000 Epoch: 19 Global Step: 247820 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:14:34,736-Speed 3251.04 samples/sec Loss 0.7412 LearningRate 0.0000 Epoch: 19 Global Step: 247830 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:14:37,920-Speed 3217.26 samples/sec Loss 0.7367 LearningRate 0.0000 Epoch: 19 Global Step: 247840 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:14:41,035-Speed 3287.90 samples/sec Loss 0.7176 LearningRate 0.0000 Epoch: 19 Global Step: 247850 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:14:44,176-Speed 3261.43 samples/sec Loss 0.7331 LearningRate 0.0000 Epoch: 19 Global Step: 247860 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:14:47,262-Speed 3319.55 samples/sec Loss 0.7029 LearningRate 0.0000 Epoch: 19 Global Step: 247870 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:14:50,448-Speed 3214.25 samples/sec Loss 0.7583 LearningRate 0.0000 Epoch: 19 Global Step: 247880 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:14:53,634-Speed 3215.16 samples/sec Loss 0.7418 LearningRate 0.0000 Epoch: 19 Global Step: 247890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:14:56,725-Speed 3314.48 samples/sec Loss 0.7541 LearningRate 0.0000 Epoch: 19 Global Step: 247900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:14:59,804-Speed 3327.02 samples/sec Loss 0.6751 LearningRate 0.0000 Epoch: 19 Global Step: 247910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:02,909-Speed 3298.60 samples/sec Loss 0.7410 LearningRate 0.0000 Epoch: 19 Global Step: 247920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:06,071-Speed 3239.76 samples/sec Loss 0.6958 LearningRate 0.0000 Epoch: 19 Global Step: 247930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:09,128-Speed 3350.68 samples/sec Loss 0.6941 LearningRate 0.0000 Epoch: 19 Global Step: 247940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:12,206-Speed 3327.94 samples/sec Loss 0.7276 LearningRate 0.0000 Epoch: 19 Global Step: 247950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:15,293-Speed 3318.17 samples/sec Loss 0.7226 LearningRate 0.0000 Epoch: 19 Global Step: 247960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:18,362-Speed 3337.30 samples/sec Loss 0.6977 LearningRate 0.0000 Epoch: 19 Global Step: 247970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:21,462-Speed 3304.64 samples/sec Loss 0.7039 LearningRate 0.0000 Epoch: 19 Global Step: 247980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:24,553-Speed 3313.08 samples/sec Loss 0.7429 LearningRate 0.0000 Epoch: 19 Global Step: 247990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:27,664-Speed 3293.74 samples/sec Loss 0.7194 LearningRate 0.0000 Epoch: 19 Global Step: 248000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:30,762-Speed 3305.77 samples/sec Loss 0.7178 LearningRate 0.0000 Epoch: 19 Global Step: 248010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:33,900-Speed 3264.80 samples/sec Loss 0.7462 LearningRate 0.0000 Epoch: 19 Global Step: 248020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:37,023-Speed 3279.64 samples/sec Loss 0.7216 LearningRate 0.0000 Epoch: 19 Global Step: 248030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:40,180-Speed 3245.42 samples/sec Loss 0.7015 LearningRate 0.0000 Epoch: 19 Global Step: 248040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:43,315-Speed 3267.04 samples/sec Loss 0.7544 LearningRate 0.0000 Epoch: 19 Global Step: 248050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:46,433-Speed 3284.79 samples/sec Loss 0.6915 LearningRate 0.0000 Epoch: 19 Global Step: 248060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:49,536-Speed 3300.94 samples/sec Loss 0.7029 LearningRate 0.0000 Epoch: 19 Global Step: 248070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:52,719-Speed 3218.34 samples/sec Loss 0.7110 LearningRate 0.0000 Epoch: 19 Global Step: 248080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:15:55,841-Speed 3282.01 samples/sec Loss 0.7163 LearningRate 0.0000 Epoch: 19 Global Step: 248090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:15:58,926-Speed 3320.17 samples/sec Loss 0.7276 LearningRate 0.0000 Epoch: 19 Global Step: 248100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:16:02,110-Speed 3216.91 samples/sec Loss 0.7691 LearningRate 0.0000 Epoch: 19 Global Step: 248110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-27 23:16:05,290-Speed 3221.30 samples/sec Loss 0.6804 LearningRate 0.0000 Epoch: 19 Global Step: 248120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:16:08,353-Speed 3344.28 samples/sec Loss 0.7440 LearningRate 0.0000 Epoch: 19 Global Step: 248130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:16:11,408-Speed 3352.27 samples/sec Loss 0.7041 LearningRate 0.0000 Epoch: 19 Global Step: 248140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:16:14,583-Speed 3225.71 samples/sec Loss 0.7067 LearningRate 0.0000 Epoch: 19 Global Step: 248150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:16:17,759-Speed 3225.45 samples/sec Loss 0.7346 LearningRate 0.0000 Epoch: 19 Global Step: 248160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:16:20,864-Speed 3299.03 samples/sec Loss 0.7203 LearningRate 0.0000 Epoch: 19 Global Step: 248170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:16:23,925-Speed 3346.59 samples/sec Loss 0.7066 LearningRate 0.0000 Epoch: 19 Global Step: 248180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:16:27,024-Speed 3305.85 samples/sec Loss 0.7494 LearningRate 0.0000 Epoch: 19 Global Step: 248190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:16:30,241-Speed 3183.57 samples/sec Loss 0.6842 LearningRate 0.0000 Epoch: 19 Global Step: 248200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:16:33,312-Speed 3335.02 samples/sec Loss 0.7442 LearningRate 0.0000 Epoch: 19 Global Step: 248210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:16:36,399-Speed 3318.14 samples/sec Loss 0.7261 LearningRate 0.0000 Epoch: 19 Global Step: 248220 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:16:39,513-Speed 3289.91 samples/sec Loss 0.7124 LearningRate 0.0000 Epoch: 19 Global Step: 248230 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:16:42,610-Speed 3307.88 samples/sec Loss 0.7437 LearningRate 0.0000 Epoch: 19 Global Step: 248240 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:16:45,710-Speed 3303.98 samples/sec Loss 0.6977 LearningRate 0.0000 Epoch: 19 Global Step: 248250 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:16:48,786-Speed 3329.61 samples/sec Loss 0.6934 LearningRate 0.0000 Epoch: 19 Global Step: 248260 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:16:51,878-Speed 3313.35 samples/sec Loss 0.7257 LearningRate 0.0000 Epoch: 19 Global Step: 248270 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:16:55,009-Speed 3271.15 samples/sec Loss 0.7196 LearningRate 0.0000 Epoch: 19 Global Step: 248280 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:16:58,069-Speed 3347.88 samples/sec Loss 0.7016 LearningRate 0.0000 Epoch: 19 Global Step: 248290 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:17:01,183-Speed 3289.09 samples/sec Loss 0.7220 LearningRate 0.0000 Epoch: 19 Global Step: 248300 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:17:04,335-Speed 3250.45 samples/sec Loss 0.7139 LearningRate 0.0000 Epoch: 19 Global Step: 248310 Fp16 Grad Scale: 8192 Required: 0 hours Training: 2022-04-27 23:17:07,428-Speed 3310.90 samples/sec Loss 0.7703 LearningRate 0.0000 Epoch: 19 Global Step: 248320 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:17:10,479-Speed 3356.84 samples/sec Loss 0.7572 LearningRate 0.0000 Epoch: 19 Global Step: 248330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:17:13,570-Speed 3315.09 samples/sec Loss 0.6957 LearningRate 0.0000 Epoch: 19 Global Step: 248340 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:17:16,692-Speed 3280.15 samples/sec Loss 0.7138 LearningRate 0.0000 Epoch: 19 Global Step: 248350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:17:19,808-Speed 3287.17 samples/sec Loss 0.7205 LearningRate 0.0000 Epoch: 19 Global Step: 248360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:17:22,865-Speed 3351.36 samples/sec Loss 0.7146 LearningRate 0.0000 Epoch: 19 Global Step: 248370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:17:25,969-Speed 3299.69 samples/sec Loss 0.7411 LearningRate 0.0000 Epoch: 19 Global Step: 248380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:17:29,071-Speed 3302.71 samples/sec Loss 0.7359 LearningRate 0.0000 Epoch: 19 Global Step: 248390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:17:32,145-Speed 3331.28 samples/sec Loss 0.7493 LearningRate 0.0000 Epoch: 19 Global Step: 248400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:17:35,426-Speed 3122.22 samples/sec Loss 0.6978 LearningRate 0.0000 Epoch: 19 Global Step: 248410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-27 23:17:38,502-Speed 3330.48 samples/sec Loss 0.7207 LearningRate 0.0000 Epoch: 19 Global Step: 248420 Fp16 Grad Scale: 32768 Required: -0 hours